All jobs

[Remote] Senior Evaluation Specialist, AI Operations

100% Remote Full-time Open now

Note: The job is a remote job and is open to candidates in USA. Fetch is a company that helps people live rewarded every day by turning everyday activities into meaningful rewards. The Senior Evaluation Specialist in AI Operations will own evaluation and dataset workstreams to improve AI system performance, defining quality metrics and creating automations that enhance workflows.

Responsibilities

  • Own evaluation & datasets: Define evaluation approaches, design gold datasets (GDS), and ensure coverage of real-world scenarios and edge cases
  • Build evaluation systems: Develop manual and automated evals, including LLM-as-judge patterns, to measure model quality and performance
  • Translate ambiguity into structure: Turn open-ended questions into clear evaluation frameworks and execution plans
  • Build automations: Create automations that improve workflows, including dataset creation, evaluation pipelines, and lightweight operational processes
  • Measure and iterate: Define and track performance metrics; refine datasets, evaluations, and workflows based on results
  • Drive execution forward: Operate with urgency and ownership; identify next steps, unblock progress, and move work forward with minimal oversight
  • Collaborate cross-functionally: Partner with other Automation Specialists, engineering, cross-functional stakeholders, and project leads to ensure high-quality, timely project deliverables
  • Improve systems: Identify gaps and implement scalable improvements to evaluation and data workflows

Skills

  • 3+ years of experience designing or working with evaluation frameworks, datasets, or quality measurement systems
  • Experience building or managing datasets (labeling, QA, iteration)
  • Ability to independently drive tasks from problem definition to execution
  • Hands-on experience with AI tools, LLM workflows, or automation platforms
  • Experience in defining and tracking model performance
  • Basic scripting or data skills (SQL, Python, etc.)
  • Experience with LLM-as-judge or model evaluation techniques
  • Familiarity with prompt evaluation or benchmarking approaches
  • Experience productionizing evaluation workflows with engineering teams

Benefits

  • Equity: We offer full-time employees equity in Fetch, so that everyone can benefit from Fetch’s growth.
  • 401k Match: Dollar-for-dollar match up to 4%.
  • Benefits for humans and pets: We offer comprehensive medical, dental and vision plans for everyone including your pets.
  • Continuing Education: Fetch provides ten thousand per year in education reimbursement.
  • Employee Resource Groups: Take part in employee-led groups that are centered around fostering a diverse and inclusive workplace through events, dialogue and advocacy. The ERGs participate in our Inclusion Council with members of executive leadership.
  • Paid Time Off: On top of our flexible PTO, Fetch observes 9 paid holidays, as well as our year-end week-long break.
  • Robust Leave Policies: 20 weeks of paid parental leave for primary caregivers, 14 weeks for secondary caregivers, and a flexible return to work schedule.
  • Calvin Care Cash: Employees who are welcoming new family members will also receive a one time $2,000 incentive to assist employees with covering the cost of childcare, clothing, diapers and much more!
  • Flexible Work Environment: Collaborate with your team in one of our stunning offices, or you can work fully remotely from anywhere in the US. We’ll ensure you are equally equipped with the hardware and software you need to get your job done in the comfort of your home. (applicable for most roles)

Company Overview

  • Fetch is a consumer-engagement platform that enables users to earn and redeem rewards. It was founded in 2013, and is headquartered in Madison, Wisconsin, USA, with a workforce of 501-1000 employees. Its website is https://www.fetch.com.
  • Company H1B Sponsorship

  • Fetch has a track record of offering H1B sponsorships, with 2 in 2025, 2 in 2024, 2 in 2023, 1 in 2022, 2 in 2021, 2 in 2020. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    You might also like