All jobs

[Remote] Senior Cloud Operations Engineer

100% Remote Full-time Open now

Note: The job is a remote job and is open to candidates in USA. The Linux Foundation is a driving force in fostering open source collaboration and supporting communities across a range of projects, including PyTorch. They are seeking a Senior Cloud Operations Engineer who will focus on the infrastructure operations of the PyTorch project, automating processes, optimizing cloud-native tools, and ensuring a robust and scalable cloud environment.

Responsibilities

  • Manage multi-cloud environments, primarily focusing on AWS services (EKS, EC2, S3, IAM, ELB)
  • Contribute to architectural exercises with open source community and technical leads to validate new cloud infrastructure
  • Implement and maintain infrastructure-as-code using Terraform via pytorch/ci-infra and pytorch/test-infra
  • Optimize cloud resource utilization and implement FinOps practices for cost management and reporting
  • Design, implement, and maintain CI/CD pipelines using GitHub Actions and ARC, including runner configurations and other elements of the CI ecosystem
  • Debug and triage issues in build and test pipelines, including experience with unit testing
  • Develop monitoring and alerting solutions for CI/CD workflows and critical infrastructure
  • Manage and optimize Cloudflare CDN deployments for PyTorch assets (R2/S3)
  • Implement best practices for CDN and overall infrastructure security
  • Develop comprehensive monitoring and observability solutions using Datadog, AWS CloudWatch, and other telemetry data collection and processing tools
  • Review and recommend monitoring solutions as project and community needs evolve
  • Participate in on-call rotations supporting operations and incident response using incident.io
  • Establish and maintain escalation procedures and resolution processes
  • Participate in ci-infra and multi-cloud working groups and support architecture decisions
  • Collaborate with external contributors and promote DevOps best practices
  • Manage GitHub repositories, including user onboarding and access control
  • Attend and contribute to technical meetings, including Infrastructure, CI Workflow, and Technical Advisory Council sessions
  • Develop and maintain technical documentation for infrastructure and processes
  • Provide guidance on developer best practices and tooling
  • Create and update runbooks for common operational tasks and incident response

Skills

  • Ability to work with communities made up of industry specialists and collaborate outside of the Linux Foundation
  • Bachelor's degree in Computer Science, Engineering, or related field
  • 7+ years of experience in cloud operations with significant AWS expertise
  • Strong knowledge of infrastructure-as-code principles and tools, particularly Terraform
  • Proficiency in scripting languages (Python, TypeScript, Bash) and containerization technologies (Docker, Kubernetes)
  • Experience with Cloudflare CDN management and optimization
  • Expertise in implementing and managing monitoring solutions, specifically Datadog and AWS CloudWatch
  • Familiarity with incident management tools and processes, particularly incident.io
  • Demonstrated experience in CI/CD pipeline design and implementation
  • Strong problem-solving skills and ability to troubleshoot complex systems
  • Excellent communication skills and experience collaborating with open source communities
  • Experience with PyTorch or other open source communities
  • Multi-cloud expertise across AWS, GCP, and Azure
  • GitHub ARC experience
  • Knowledge of FinOps principles and cloud cost optimization strategies
  • Contributions to open source projects, especially in infrastructure management roles
  • Familiarity with the Linux Foundation or similar open source foundations
  • Experience mentoring other engineers and fostering a collaborative team environment

Benefits

  • The Linux Foundation maintains a predominantly remote workforce
  • Committed to hiring top-notch talent
  • Providing a flexible and supportive work culture
  • Collaboration is embedded in our DNA
  • Work closely together while not being confined to a traditional office space

Company Overview

  • The Linux Foundation is the organization of choice for the world's top developers and companies to build ecosystems that accelerate open technology development and commercial adoption. It was founded in 2000, and is headquartered in San Francisco, California, USA, with a workforce of 201-500 employees. Its website is http://www.linuxfoundation.org.
  • Apply To This Job

    You might also like

    [Remote] Director - Product Management (Digital Experience Team)

    100% Remote Full-time

    [Remote] Customer Success Manager | Mid-Market

    100% Remote Full-time

    [Remote] Senior Sales Executive (Remote)

    100% Remote Full-time

    [Remote] Technical Account Manager

    100% Remote Full-time

    [Remote] Digital Sales Associate

    100% Remote Full-time

    [Remote] Senior Account Manager (Remote)

    100% Remote Full-time

    [Remote] Software Development Engineer in Test (SDET)

    100% Remote Full-time

    [Remote] Senior Data Scientist (SEO & AI)

    100% Remote Full-time

    [Remote] REMOTE - Director, Cybersecurity Defense(Preferred Experience in Managed Care/Healthcare) - R12702

    100% Remote Full-time

    [Remote] Senior VP, Executive Enterprise Operations

    100% Remote Full-time

    Behavioral Health Care Manager

    100% Remote Full-time

    Customer Success Manager, Commercial

    100% Remote Full-time

    Principal Business Consultant - Global Content

    100% Remote Full-time

    CDL-A Company Driver - 1-5mo EXP Required - Regional - Tanker - Western Dairy Transport

    100% Remote Full-time

    [Remote] SaaS Implementation Consultant (Remote, USA)

    100% Remote Full-time

    Experienced Digital Collections Live Chat Specialist – Driving Sustainability through Customer Engagement and Financial Solutions

    100% Remote Full-time

    Experienced Customer Service Associate – Amazon Locker+ Retail Location

    100% Remote Full-time

    Experienced Remote Customer Support Specialist for Dynamic Work Environment – Competitive Hourly Rate and Comprehensive Training

    100% Remote Full-time

    Experienced Data Engineer – Cloud-Based Data Pipeline Development and Support

    100% Remote Full-time

    Experienced Full Stack Customer Care Advisor – Remote Support for arenaflex Products

    100% Remote Full-time