All jobs

AI Evaluation manager

100% Remote Full-time Open now

About the Role

Luma is pushing the boundaries of generative AI, building tools that redefine how visual content is created. We’re seeking a candidate to help shape and scale the way we understand, measure, and improve model performance. In this role, you’ll partner with researchers, engineers, and technical artists to evaluate our models against real-world creative use cases, design frameworks that capture qualitative nuance, and identify actionable insights that guide development.

This is not a checkbox metrics role — it's about building evaluative systems that match the complexity of human perception, creativity, and intention.

Responsibilities

  • Evaluate generative model performance across diverse tasks, prompts, and modalities.

  • Identify key failure modes, regression patterns, and edge cases that impact product quality.

  • Develop and maintain qualitative evaluation frameworks that are scalable and reusable.

  • Collaborate closely with technical artists and engineers to align evaluations with model capabilities and target use cases.

  • Translate high-level product goals into concrete evaluative criteria.

  • Lead qualitative studies, side-by-side comparisons, and human-in-the-loop evaluation efforts.

  • Provide detailed feedback that informs model fine-tuning, dataset curation, and product UX.

  • Stay informed about emerging evaluation standards in generative AI and creative tools.

Qualifications

  • Master’s degree or higher in Cognitive Science, Human-Computer Interaction (HCI), Design Research, Psychology, Media Studies, or a related field.

  • 5+ years of experience in product evaluation, UX research, model testing, or similar roles that involve structured qualitative assessment.

  • Deep familiarity with creative workflows and real-world use cases for generative models (e.g., animation, filmmaking, digital art, VFX).

  • Strong systems thinking and the ability to define abstract qualities (like believability, identity retention, or scene coherence) in clear evaluative terms.

  • Experience working cross-functionally with engineers, researchers, and creatives.

  • Excellent written communication skills and the ability to synthesize nuanced judgments into clear, actionable insights.

Nice to Have

  • Background in motion, visual effects, or storytelling pipelines

  • Experience evaluating AI-generated media (video, images, 3D)

  • Prior work on building internal tools for qualitative data collection or scoring

  • Familiarity with prompt engineering and reference-based input methods

Originally posted on Himalayas

Apply To this Job

You might also like

Senior Onchain Recruiter

100% Remote Full-time

Senior Software Engineer - Data Processing

100% Remote Full-time

Lead Software Engineer (Tech Lead)

100% Remote Full-time

Director of Business Development - DoD/Navy (4271)

100% Remote Full-time

Senior Campaign Manager, Paid Social

100% Remote Full-time

Senior In House CRA

100% Remote Full-time

Customer Success Manager

100% Remote Full-time

Site Reliability Engineer - remote within EMEA

100% Remote Full-time

Intermediate Site Reliability Engineer, Foundations

100% Remote Full-time

Dozent (m/w/x) für Fachinformatiker in Nordrhein-Westfalen (oder remote)

100% Remote Full-time

Data Entry & Sales Representative (Work at Home) Now Hiring!

100% Remote Full-time

Remote Customer Service Agent – Home‑Based Passenger Support Specialist for arenaflex

100% Remote Full-time

Global Immigration Manager

100% Remote Full-time

Experienced Project Manager for Remote Application Delivery - Home Depot $27/Hour

100% Remote Full-time

Experienced Call Center Agent – Customer Technical Support for Display Products at arenaflex

100% Remote Full-time

WFH Prior Authorization Specialist $17/hr. *Irving, TX ONLY*

100% Remote Full-time

Job Title: Full Remote Customer Service Representative - Deliver Exceptional Experiences for blithequark's Valued Clients

100% Remote Full-time

Experienced Customer Service Representative - Live Chat (FULLY REMOTE) at blithequark

100% Remote Full-time

National Accounts Sales, Principal, Amazon One Medical

100% Remote Full-time

Experienced Data Entry Associate – Entry Level Position at arenaflex

100% Remote Full-time