[Remote] Platform Engineer
Note: The job is a remote job and is open to candidates in USA. HHAeXchange is the leading technology platform for home and community-based care, founded in 2008. They are seeking a Platform Engineer to join their Data & AI Engineering team, focusing on platform reliability and delivery automation to ensure the infrastructure for their AI platform and data pipelines is stable and scalable.
Responsibilities
- Own availability, latency, and performance targets for AI platform services and data infrastructure running on AWS
- Design and implement monitoring, alerting, and observability frameworks across the platform stack
- Lead incident response, root cause analysis, and post-mortem processes for platform-level outages or degradations
- Define and track SLOs/SLAs for core platform primitives including RAG pipelines, agent orchestration services, and model access layers
- Proactively identify reliability risks and drive engineering improvements before they become production issues
- Build and maintain runbooks, disaster recovery procedures, and operational documentation
- Design, build, and maintain CI/CD pipelines for AI platform components, data pipelines, and internal applications
- Own infrastructure-as-code (IaC) practices across the team using tools such as Terraform or AWS CDK
- Manage and optimize AWS environments including ECS, Lambda, S3, RDS, Redshift, API Gateway, and related services
- Implement and enforce security, compliance, and cost optimization best practices across AWS infrastructure
- Automate deployment, scaling, and configuration management to reduce manual operational overhead
- Partner with AI Platform Engineers to containerize and operationalize AI services and agent frameworks
- Support Data & AI Engineers with environment management, access controls, and deployment tooling for Polaris and data pipeline infrastructure
- Serve as the operational backbone for the AI platform team – ensuring engineers can ship and iterate quickly without being blocked by infrastructure concerns
- Contribute to our 'factory model' vision by making deployment and reliability a repeatable, scalable capability rather than an ad hoc function
- Other duties as assigned by supervisor or HHAeXchange leader
Skills
- 3+ years of professional experience in a DevOps, SRE, or platform engineering role
- Hands-on AWS experience required – AgentCore, Bedrock, ECS, Lambda, S3, RDS, Redshift, CloudWatch, IAM, VPC, and related services
- Experience with infrastructure-as-code tools such as Terraform or AWS CDK
- Strong CI/CD experience with tools such as GitHub Actions
- Experience with containerization and orchestration (Docker, ECS, or Kubernetes)
- Familiarity with AI/ML infrastructure patterns – model serving, vector databases, pipeline orchestration (strongly preferred)
- Experience with observability and monitoring tooling (Datadog, CloudWatch)
- Prior experience in a SaaS environment
- Strong verbal and written communication skills with ability to collaborate across technical and non-technical stakeholders
- Self-starter with a proactive approach to identifying and resolving infrastructure risk before it impacts delivery
- Willingness to explore and adopt AI tools responsibly to enhance productivity and innovation in your role
Benefits
- This is a benefits-eligible position.
- HHAeXchange offers competitive health plans
- Paid time-off
- Company paid holidays
- 401K retirement program with a Company elected match
- Including other company sponsored programs
Company Overview