[Remote] Senior Evaluation Specialist, AI Operations

100% Remote Full-time Open now

Note: The job is a remote job and is open to candidates in USA. Fetch is a company that helps people live rewarded every day by turning everyday activities into meaningful rewards. The Senior Evaluation Specialist in AI Operations will own evaluation and dataset workstreams to improve AI system performance, defining quality metrics and creating automations that enhance workflows.

Responsibilities

Own evaluation & datasets: Define evaluation approaches, design gold datasets (GDS), and ensure coverage of real-world scenarios and edge cases
Build evaluation systems: Develop manual and automated evals, including LLM-as-judge patterns, to measure model quality and performance
Translate ambiguity into structure: Turn open-ended questions into clear evaluation frameworks and execution plans
Build automations: Create automations that improve workflows, including dataset creation, evaluation pipelines, and lightweight operational processes
Measure and iterate: Define and track performance metrics; refine datasets, evaluations, and workflows based on results
Drive execution forward: Operate with urgency and ownership; identify next steps, unblock progress, and move work forward with minimal oversight
Collaborate cross-functionally: Partner with other Automation Specialists, engineering, cross-functional stakeholders, and project leads to ensure high-quality, timely project deliverables
Improve systems: Identify gaps and implement scalable improvements to evaluation and data workflows

Skills

3+ years of experience designing or working with evaluation frameworks, datasets, or quality measurement systems
Experience building or managing datasets (labeling, QA, iteration)
Ability to independently drive tasks from problem definition to execution
Hands-on experience with AI tools, LLM workflows, or automation platforms
Experience in defining and tracking model performance
Basic scripting or data skills (SQL, Python, etc.)
Experience with LLM-as-judge or model evaluation techniques
Familiarity with prompt evaluation or benchmarking approaches
Experience productionizing evaluation workflows with engineering teams

Benefits

Equity: We offer full-time employees equity in Fetch, so that everyone can benefit from Fetch’s growth.
401k Match: Dollar-for-dollar match up to 4%.
Benefits for humans and pets: We offer comprehensive medical, dental and vision plans for everyone including your pets.
Continuing Education: Fetch provides ten thousand per year in education reimbursement.
Employee Resource Groups: Take part in employee-led groups that are centered around fostering a diverse and inclusive workplace through events, dialogue and advocacy. The ERGs participate in our Inclusion Council with members of executive leadership.
Paid Time Off: On top of our flexible PTO, Fetch observes 9 paid holidays, as well as our year-end week-long break.
Robust Leave Policies: 20 weeks of paid parental leave for primary caregivers, 14 weeks for secondary caregivers, and a flexible return to work schedule.
Calvin Care Cash: Employees who are welcoming new family members will also receive a one time $2,000 incentive to assist employees with covering the cost of childcare, clothing, diapers and much more!
Flexible Work Environment: Collaborate with your team in one of our stunning offices, or you can work fully remotely from anywhere in the US. We’ll ensure you are equally equipped with the hardware and software you need to get your job done in the comfort of your home. (applicable for most roles)

Company Overview

Fetch is a consumer-engagement platform that enables users to earn and redeem rewards. It was founded in 2013, and is headquartered in Madison, Wisconsin, USA, with a workforce of 501-1000 employees. Its website is https://www.fetch.com.

Company H1B Sponsorship

Fetch has a track record of offering H1B sponsorships, with 2 in 2025, 2 in 2024, 2 in 2023, 1 in 2022, 2 in 2021, 2 in 2020. Please note that this does not guarantee sponsorship for this specific role.

Apply To This Job

Apply

[Remote] Senior Evaluation Specialist, AI Operations

You might also like

[Remote] Account Director

[Remote] Senior Platform Solution Consultant, Pre-Sales

[Remote] Sr. Solutions Engineer - Mulesoft

[Remote] HIM Coder Analyst II-REMOTE within State of TX

[Remote] Senior Product Digital Designer

[Remote] Major Account Manager, Service Sales (Remote)

[Remote] Account Executive (Future Opportunities)

[Remote] Associate Product Account Manager

[Remote] Senior Product Designer

[Remote] Data Product Analyst, Private Investor

Experienced Customer Service Representative – Delivering Exceptional Experiences for arenaflex Customers

Experienced Virtual Customer Service Representative – Flexible Scheduling and Self-Employed Opportunities

Experienced Customer Service Representative – Remote Support Specialist for arenaflex

[Remote] Data Analyst II

Team Lead - Data Centers

Kubernetes Engineer - Remote

AI Security Engineer

Senior Manager, Clinical Scientist

Garment Manufacturing QC Specialist / AI Trainer

Remote Medical Transcription Jobs for Female Freshers - Flexible Schedule