All jobs

[Remote] Site Reliability Engineer

100% Remote Full-time Open now

Note: The job is a remote job and is open to candidates in USA. TalentDome Staffing is a high-growth, AI-driven narrative intelligence startup seeking a Senior Site Reliability Engineer (SRE) / Infrastructure Engineer. The role requires operational ownership of a production environment, focusing on infrastructure orchestration, high-throughput scaling, and GPU application deployment to support massive data flows.

Responsibilities

  • Infrastructure Orchestration: Maintain, optimize, and expand the core infrastructure, ensuring everything is cleanly declared via Terraform and managed across high-performance Kubernetes clusters
  • High-Throughput Scaling: Design and manage environments capable of sustaining immense data ingestion scaling, high-throughput pipelines, and massive search database operations
  • GPU Application Deployment: Collaborate with the R&D team to successfully deploy, optimize, and manage highly specialized machine learning and AI applications running on GPUs
  • System Optimization & Reliability: Partner closely with backend teams to heavily optimize production Java deployments and Python workflows, guaranteeing maximum uptime, high availability, and seamless scaling
  • Technical Leadership: Serve as a foundational pillar for infrastructure architecture, establishing operational best practices without requiring handholding or micro-management

Skills

  • 8+ years of dedicated, hands-on experience with Kubernetes and Terraform
  • Ideally 15+ years of total technical experience in infrastructure or site reliability engineering
  • Deep architectural mastery of deployment systems, cluster orchestration, and high-availability scaling
  • Proven cloud hosting experience, with strong proficiency in AWS
  • Exposure to or experience with GCP is a significant advantage for supporting R&D workflows
  • Concrete experience deploying and scaling application workflows that interface with GPUs and high-volume data ingestion layers
  • Familiarity with or exposure to optimizing runtime environments for Java and Python applications is highly beneficial
  • Exceptional self-direction and problem-solving capability
  • Professional maturity to eventually step into a formal leadership role as the infrastructure team expands

Benefits

  • True Operational Autonomy: The opportunity to architect and scale greenfield deployments for a rapidly expanding AI data platform.
  • High-Caliber Environment: Collaborate directly with an elite team of backend engineers and machine learning R&D specialists.
  • Flexible, Modern Workspace: Enjoy 100% remote working flexibility across the United States.
  • Open to equity incentives

Company Overview

  • TalentDome is your R&D talent partner in SmartTech across the software development life cycle (SDLC) and the software stack. We connect U.S. It was founded in 2024, and is headquartered in Dallas, Texas, US, with a workforce of 2-10 employees. Its website is https://www.talentdomestaffing.com.
  • Apply To This Job

    You might also like

    [Remote] Cloud Platform Engineer

    100% Remote Full-time

    [Remote] Paid Media Lead, Mapping

    100% Remote Full-time

    [Remote] Network Engineer II

    100% Remote Full-time

    [Remote] Lead Data Scientist

    100% Remote Full-time

    [Remote] Business Analyst (Claims), Senior

    100% Remote Full-time

    [Remote] Senior Sales Engineer

    100% Remote Full-time

    [Remote] Senior Accountant II

    100% Remote Full-time

    [Remote] Fractional CRO, Financial and Digital Markets

    100% Remote Full-time

    [Remote] Database Track Sr.Engineer

    100% Remote Full-time

    [Remote] Business Analyst - Oracle Health

    100% Remote Full-time

    Account Executive UK

    100% Remote Full-time

    Technician - Level 1 in Durham, NC

    100% Remote Full-time

    Experienced Remote Customer Service Representative – Delivering Exceptional Support and Solutions to Diverse Customers via Phone and Leveraging Excellent Communication Skills

    100% Remote Full-time

    Project Manager -Technical Services (Data Center Cooling Solutions)

    100% Remote Full-time

    Experienced Remote Customer Support Representative - Flexible Hours, Competitive Pay, and Career Growth at blithequark

    100% Remote Full-time

    Recruitment & Office Assistant ID-1781 – Amazon Store

    100% Remote Full-time

    Experienced Full Stack Data Entry Specialist – Call Support and Customer Help

    100% Remote Full-time

    Experienced Customer Success Coordinator & Temporary Project Manager – Medicare Industry Support

    100% Remote Full-time

    Pharmacy Telehealth Operations Manager

    100% Remote Full-time

    Sr. Software Engineer – React Native – Onchain

    100% Remote Full-time