All jobs

[Remote] Sr. Engineering Manager, MLOps

100% Remote Full-time Open now

Note: The job is a remote job and is open to candidates in USA. Quince is a tech company disrupting the retail industry by leveraging AI, analytics, and automation. They are seeking a Senior Engineering Manager, MLOps to build and scale the infrastructure that supports production-grade Machine Learning, ensuring seamless operations for their Data Scientists and AI Researchers.

Responsibilities

  • Define the MLOps Vision & Strategy: Architect a long-term roadmap that transitions ML workflows from manual scripts to a fully automated, self-service platform for all Quince Data Scientists and AI Researchers
  • Own the "Paved Road" for Production: Build and maintain the end-to-end infrastructure for model training, deployment, and serving, ensuring researchers can move from "idea to production" with zero friction
  • Drive Strategic Prioritization: Partner with business leaders to align infrastructure investments with core e-commerce drivers like real-time personalization, dynamic pricing, and inventory forecasting
  • Lead "Build vs. Buy" Evaluations: Make high-judgment decisions on when to leverage cloud-native services (e.g., SageMaker, Vertex AI) versus building custom internal tools to optimize for cost, speed, and flexibility
  • Guarantee System Scalability & Reliability: Oversee the uptime and performance of production ML services, ensuring the stack can handle massive traffic surges and seasonal spikes without degradation
  • Manage Compute Governance & Costs: Direct the optimization of high-cost computational resources, such as GPU clusters and cloud instances, balancing high-performance training needs with fiscal responsibility
  • Recruit and Mentor Top Talent: Build and lead a high-performing team of ML Infra and DevOps engineers, providing technical coaching, career pathing, and performance management
  • Establish MLOps Standards: Drive the adoption of best practices in CI/CD for ML, Infrastructure as Code (IaC), and automated testing to ensure a modular and maintainable system
  • Bridge the Research-Engineering Gap: Act as the primary cross-functional lead, translating the complex needs of AI Researchers into actionable engineering requirements for the infrastructure team
  • Define and Track Velocity Metrics: Establish KPIs for the infrastructure team, such as model deployment frequency, mean time to recovery (MTTR), and infrastructure cost per inference
  • Champion Operational Excellence: Lead root-cause analyses (RCAs) for production failures and foster a culture of accountability where systemic fixes are prioritized over "quick patches."
  • Stay Ahead of the AI Curve: Monitor emerging trends in LLM-ops, vector databases, and real-time feature engineering to ensure Quince’s infrastructure remains competitive and future-proof

Skills

  • 10+ years of industry experience, with at least 3-5 years in a leadership or management role specifically focused on ML Infrastructure, MLOps, or large-scale Data Platform engineering
  • Proven track record of building and scaling MLOps platforms that support the full model lifecycle—from data ingestion and distributed training to real-time inference and monitoring
  • Deep technical expertise in cloud-native infrastructure (preferably AWS) and orchestration tools like Kubernetes (EKS), Docker, and Infrastructure as Code (Terraform/Pulumi)
  • Hands-on experience with ML frameworks and tooling, such as PyTorch, TensorFlow, Kubeflow, or SageMaker, and a strong opinion on how to integrate them into a cohesive developer experience
  • Expertise in building and managing Feature Stores and high-throughput data pipelines (using tools like Spark, Flink, or Kafka) to ensure data consistency across training and serving
  • Experience partnering with AI Research and Data Science teams to understand their unique workflows and translate research needs into robust, scalable engineering solutions
  • Strong understanding of CI/CD for ML, including automated testing for models, model versioning, and 'blue-green' or 'canary' deployment strategies
  • Demonstrated ability to manage high-cost compute resources, with experience optimizing GPU utilization and cloud spend in a hyper-growth environment
  • Excellence in operational leadership, with a history of driving service availability, performance, and stability through rigorous on-call rotations and root-cause analysis
  • A product-oriented mindset, with the ability to treat infrastructure as a platform and prioritize the roadmap based on researcher velocity and business ROI
  • Exceptional communication and influence skills, capable of navigating ambiguity and building consensus across engineering, product, and data science leadership
  • Kindness and high standards: You move fast and push for excellence, but you do so as a supportive team player who fosters a culture of psychological safety and extreme candor

Benefits

  • Bonus and equity may also be provided for eligible roles

Company Overview

  • Quince is an e-commerce company that offers apparel, accessories, home goods, and personal care products through an online platform. It was founded in 2018, and is headquartered in San Francisco, California, USA, with a workforce of 1001-5000 employees. Its website is https://www.quince.com.
  • Company H1B Sponsorship

  • Quince has a track record of offering H1B sponsorships, with 1 in 2023. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    You might also like

    [Remote] Account Manger

    100% Remote Full-time

    [Remote] Lead, Sales

    100% Remote Full-time

    [Remote] Director, Laboratory Business Development

    100% Remote Full-time

    [Remote] Group Manager, Product (Remote)

    100% Remote Full-time

    [Remote] Technology Program Manager III

    100% Remote Full-time

    [Remote] Sr. Director of Business Development - Softlines

    100% Remote Full-time

    [Remote] Key Account Manager - CPI

    100% Remote Full-time

    [Remote] Account Executive

    100% Remote Full-time

    [Remote] Vice President of Sales

    100% Remote Full-time

    [Remote] Vice President of Training (Healthcare)

    100% Remote Full-time

    Experienced Full Stack Data Entry Specialist – Healthcare Data Management

    100% Remote Full-time

    Senior Service Advocate

    100% Remote Full-time

    Experienced Data Entry Research Panelist – Flexible Work-from-Home Opportunities at arenaflex

    100% Remote Full-time

    Regional Sales Executive - Atlanta - Southeast Region

    100% Remote Full-time

    Entry Level Data Entry Specialist – Launch Your Career with blithequark, No Prior Experience Required, and Unlock Endless Opportunities for Growth and Development

    100% Remote Full-time

    ServiceNow Solution Architect | Mercer

    100% Remote Full-time

    Experienced Customer Service Representative – Evening Shifts – Remote Opportunity

    100% Remote Full-time

    Entry Level Customer Training Specialist – Traveling at blithequark

    100% Remote Full-time

    Experienced Data Entry Clerk – Work From Home Opportunity with arenaflex

    100% Remote Full-time

    Experienced Online Chat Support Specialist | $25-$35/hr | Start Without Experience

    100% Remote Full-time