All jobs

[Remote] Director of Site Reliability Engineering

100% Remote Full-time Open now

Note: The job is a remote job and is open to candidates in USA. Talently is a cutting-edge organization in the Technology, Information and Media industry, and they are seeking a Director of Site Reliability Engineering. In this role, you will lead and build world-class Site Reliability Engineering practices, driving strategic reliability initiatives and mentoring engineering teams in a remote-first environment.

Responsibilities

  • Define and execute a comprehensive company-wide Site Reliability Engineering strategy, embedding reliability as a core discipline across engineering teams
  • Build, lead, and develop a high-performing SRE organization, including hiring, mentoring, and fostering a reliability-focused culture
  • Establish SLIs, SLOs, KPIs, and error budgets to measure and drive platform reliability and performance improvement
  • Guide architecture decisions and technical roadmaps for highly available, resilient, and scalable distributed systems
  • Drive adoption of observability, monitoring, logging, and incident response solutions across cloud-based microservices environments, primarily on Google Cloud Platform
  • Establish and oversee robust incident response frameworks, operational governance, and post-incident analysis processes
  • Promote and implement best practices for infrastructure automation, cloud-native operations, and cost optimization
  • Lead continuous improvement and innovation initiatives, including exploring AI-driven operations and new SRE methodologies

Skills

  • 12+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or DevOps in high-scale environments
  • 5+ years of proven technical leadership, building and scaling SRE teams and practices
  • Strong expertise with distributed systems, cloud-native infrastructures, microservices, and hands-on Google Cloud Platform experience (GKE, Compute Engine, Cloud Functions)
  • Deep proficiency with infrastructure as code, automation frameworks, and CI/CD deployment pipelines
  • Track record designing large-scale observability and monitoring solutions using tools like Prometheus, Grafana, Datadog, or New Relic
  • Excellent communication, organizational development, and mentorship abilities
  • Strong programming ability in Python, Go, Java, or similar languages
  • Cloud or reliability certifications (e.g., Google Cloud Professional, SRE certifications)
  • Experience implementing AIOps, anomaly detection, predictive analytics, or automated remediation/self-healing infrastructure
  • Familiarity with AI/ML tools for operational intelligence and intelligent alerting
  • Strong database performance tuning and distributed data systems knowledge
  • Comfortable operating in fast-paced, high-growth technology environments
  • Bachelor's degree in Computer Science, Engineering, or related field

Company Overview

  • Talently provides nationwide recruitment services, executive search, and career alignment programs. It was founded in 2022, and is headquartered in Newport Beach, California, US, with a workforce of 11-50 employees. Its website is https://www.talently.com/.
  • Apply To This Job

    You might also like

    [Remote] Senior Data Scientist – Entity Resolution

    100% Remote Full-time

    [Remote] Director of Legal Recruiting

    100% Remote Full-time

    [Remote] Principal Recruiter - Life Sciences

    100% Remote Full-time

    [Remote] Principal Recruiter

    100% Remote Full-time

    [Remote] Machine Learning Engineer

    100% Remote Full-time

    [Remote] Senior Full Stack Software Engineer

    100% Remote Full-time

    [Remote] Seismic Operations Specialist

    100% Remote Full-time

    [Remote] Salesforce Solutions Lead

    100% Remote Full-time

    [Remote] Program Manager

    100% Remote Full-time

    [Remote] Product Manager, Salesforce & Internal Platforms

    100% Remote Full-time

    Experienced Seasonal Customer Service Representative – Sustainable Living and Community Development

    100% Remote Full-time

    Customer Service Assistant – Front-Line Support Specialist | Immediate Hiring Opportunity at arenaflex

    100% Remote Full-time

    Senior Manager, Supply Chain

    100% Remote Full-time

    Full-Time Data Entry Associate – E‑Commerce Inventory Management & POS Integration Specialist at arenaflex (Remote)

    100% Remote Full-time

    Experienced Remote Customer Service Representative – Pet Care and Support

    100% Remote Full-time

    FedEx Customer Service Jobs – Work from Home

    100% Remote Full-time

    Healthcare Recruiter (100% Remote)

    100% Remote Full-time

    Security Engineer, Privacy

    100% Remote Full-time

    Experienced Part-Time Remote Data Entry Specialist – E-commerce Operations Support at arenaflex

    100% Remote Full-time

    Associate, Renewals Sales - Bilingual

    100% Remote Full-time