All jobs

Google Cloud Platform DevOps Engineer

100% Remote Full-time Open now

100% Remote The Lead DevOps Engineer, a key member of the EIT DevOps Team, is responsible for the staging and production infrastructure of Iron Mountain s Digital Services within the federal sector. This role is pivotal in managing and optimizing staging and production deployment environments across Google Cloud Platform (Google Cloud Platform), Amazon Web Services (AWS), and Microsoft Azure. Core Responsibilities:

  • Cloud Infrastructure Management: Deploy, manage, and maintain cloud infrastructure across AWS, Azure, and/or Google Cloud Platform, ensuring compliance for government workloads.
  • Infrastructure Automation: Automate infrastructure provisioning using Infrastructure as Code (IaC) tools like Terraform, OpenTofu, or AWS CloudFormation.
  • Deployment Pipeline Streamlining: Collaborate with development teams to streamline CI/CD pipelines using tools such as GitLab and OpenTofu for efficient infrastructure and application delivery.
  • Performance Optimization: Monitor system performance, participate in capacity planning, and optimize application and infrastructure performance by tuning configurations and identifying bottlenecks.
  • Automation Development: Develop scripts and tools to automate routine operations, including patching, scaling, and monitoring.
  • Self-Healing Systems: Design and implement self-healing systems that proactively detect and resolve faults.
  • Data Integrity & Availability: Manage backup and disaster recovery strategies to ensure data integrity and availability across environments.
  • Security & Compliance: Perform regular security audits and vulnerability patching, adhering to government compliance requirements (e.g., FedRAMP, NIST).

Incident Management & Observability:

  • Real-time Incident Resolution: Respond to and resolve infrastructure incidents and outages in real-time, minimizing disruption.
  • Root Cause Analysis (RCA): Conduct RCA for production issues and implement long-term corrective actions.
  • On-Call Participation: Participate in an on-call rotation, escalating and coordinating responses to high-severity issues.
  • Incident Documentation: Document incidents, responses, and postmortems to capture lessons learned.
  • Complex Problem Diagnosis: Diagnose complex infrastructure and application problems, including database performance issues, latency, and service connectivity challenges.
  • Comprehensive Logging & Telemetry: Ensure comprehensive logging and telemetry to support incident response, performance tuning, and auditing.
  • Observability Improvements: Drive observability improvements by collaborating with Engineering and Platform teams to enhance system reliability and traceability.

Application & Knowledge Management:

  • Application Incident Leadership: Lead resolution efforts for application-level incidents, ensuring coordinated response across teams.
  • Application Lifecycle Management: Oversee application lifecycle management, including version upgrades, security patches, and regional rollouts.
  • Knowledge Base Contribution: Contribute to a shared knowledge base, documenting recurring issues and resolution steps.
  • Scaling Strategies: Support scaling strategies to meet regional demand, ensuring infrastructure resilience and compliance with service-level objectives (SLOs).

Qualifications:

  • Minimum 5 years of experience leading and supporting enterprise-level applications in production environments.
  • Proven experience in cloud infrastructure provisioning and management on Google Cloud Platform (Google Cloud Platform), Amazon Web Services (AWS), or Microsoft Azure.
  • Proficiency in scripting languages such as Python, Bash, or PowerShell for automation and systems management.
  • Strong understanding of containerization and orchestration technologies, including Docker, Kubernetes, and Helm.
  • Hands-on experience with cloud object storage services such as AWS S3, Google Cloud Storage, or Azure Blob Storage.
  • Working knowledge of database and persistence technologies, particularly MongoDB and PostgreSQL.
  • Experience supporting and integrating microservices architectures and RESTful APIs.
  • Familiarity with incident and service management systems, such as ServiceNow and Jira.
  • Experience with SAST/DAST security and compliance tooling, such as Prisma Cloud, CrowdStrike, XSOAR, and Burp Suite.
  • Basic understanding of identity and access management (IAM) and SSO technologies, particularly Okta, and application integration practices.
  • Excellent troubleshooting skills, especially in complex, distributed, cloud-based environments.
  • Strong written and verbal communication skills, with the ability to clearly document procedures, incidents, and solutions.
  • Effective at producing support documentation and conducting knowledge transfer or training sessions.
  • Demonstrated ability to work independently with minimal supervision in a fast-paced, collaborative, and globally distributed team.
  • A motivated, proactive mindset with a commitment to delivering high-quality, secure, and reliable systems.

Apply tot his job Apply To this Job

You might also like

Vidalytics: Sr PHP Developer – High load, MySQL, GCP, Laminas ?

100% Remote Full-time

Deputy General Counsel, Commercial and Business Operations (Remote, United States) (865-SLS)

100% Remote Full-time

General Counsel/ Extraordinary Brands

100% Remote Full-time

Growth Machine Learning Engineer, Generative AI (Remote) in San Francisco, CA

100% Remote Full-time

Genetic Counselor Assistant Remote

100% Remote Full-time

Navigator/Genetic Counseling Assistant-Breast Surgical Oncology-FT-Days-MPG

100% Remote Full-time

Genetic Counselor (Temporary Contract)

100% Remote Full-time

Labcorp Genetic Counselor – Lab Based – Remote in Westborough, Massachusetts

100% Remote Full-time

Genetic Counselor-Genetic Counseling Clinic

100% Remote Full-time

Genetics Counselor RN PD

100% Remote Full-time

Experienced Customer Service Representative – Remote Work Opportunity with blithequark for Delivering Exceptional Customer Experiences

100% Remote Full-time

Body & Neuroradiologists (Evenings - Remote)

100% Remote Full-time

Experienced Customer Success Manager – Brazil

100% Remote Full-time

Experienced Customer Service Representative – Data Entry and Communication Expert

100% Remote Full-time

Voice Over Artist

100% Remote Full-time

[Remote] Account Executive

100% Remote Full-time

Patent Attorney / Biotechnology / Remote USA 2047-i

100% Remote Full-time

Sr. Product Manager-Technical, WWOSTech

100% Remote Full-time

Principal Mechanical Integration Engineer - DCI Pipes, Frames and Structures (REMOTE)

100% Remote Full-time

Magical Experience Ambassador - Disney Magical Express Boarding Representative

100% Remote Full-time