All jobs

Evaluation Scenario Writer - AI Agent Testing Specialist

100% Remote Full-time Open now

This opportunity is only for candidates currently residing in the specified country. Your location may affect eligibility and rates.

At Mindrift, innovation meets opportunity. We believe in using the power of collective intelligence to ethically shape the future of AI.

What we do

The Mindrift platform connects specialists with AI projects from major tech innovators. Our mission is to unlock the potential of Generative AI by tapping into real-world expertise from across the globe.

About the Role

We’re looking for someone who can design realistic and structured evaluation scenarios for LLM-based agents. You’ll create test cases that simulate human-performed tasks and define gold-standard behavior to compare agent actions against. You’ll work to ensure each scenario is clearly defined, well-scored, and easy to execute and reuse. You’ll need a sharp analytical mindset, attention to detail, and an interest in how AI agents make decisions.

Although every project is unique, you might typically:

  • Designing structured test scenarios based on real-world tasks
  • Defining the golden path and acceptable agent behavior
  • Annotating task steps, expected outputs, and edge cases
  • Working with devs to test your scenarios and improve clarity
  • Reviewing agent outputs and adapting tests accordingly

How to get started

Simply apply to this post, qualify, and get the chance to contribute to projects aligned with your skills, on your own schedule. From creating training prompts to refining model responses, you’ll help shape the future of AI while ensuring technology benefits everyone.

Requirements

  • You have a Bachelor's orMaster’s degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields.
  • You have 3+ years of experience.
  • Your level of English is advanced (C1) or above.
  • You are ready to learn new methods, able to switch between tasks and topics quickly and sometimes work with challenging, complex guidelines.
  • Our freelance role is fully remote so, you just need a laptop, internet connection, time available and enthusiasm to take on a challenge.

Benefits

Why this freelance opportunity might be a great fit for you?

  • Take part in a part-time, remote, freelance project that fits around your primary professional or academic commitments.
  • Work on advanced AI projects and gain valuable experience that enhances your portfolio.
  • Influence how future AI models understand and communicate in your field of expertise.

Originally posted on Himalayas

Apply To this Job

You might also like

[Job-24334] QA EntryLevel | affirmative position for women, Brazil

100% Remote Full-time

Personal Lines Sales Broker (AB) - Leads Provided

100% Remote Full-time

SkillBridge Program Talent Pool

100% Remote Full-time

Telephone Account Manager - Overseas

100% Remote Full-time

Information Security Compliance Analyst (12 Month Contract)

100% Remote Full-time

Mental Health Therapist LMSW or MHC-LP

100% Remote Full-time

Electrical Engineer Summer Intern - #2792.20

100% Remote Full-time

Accounts Receivable Manager

100% Remote Full-time

Construction/Municipal Survey Crew Chief – Regina, SK

100% Remote Full-time

UT/UTCD Data Analyst [Level 3]

100% Remote Full-time

Wage and Hour Attorney (REMOTE)

100% Remote Full-time

Consultant PEGA Senior System Architect – Conseil Transformation Digitale - Cust

100% Remote Full-time

Independent Wellness Consultant (1099) | Remote | Flexible Income

100% Remote Full-time

EN241 VA: Advanced Academic Writing (Fall 2026) *Multiple Sections Available*

100% Remote Full-time

Remote Data Entry Specialist – Work From Home Position | Flexible Data Entry Jobs

100% Remote Full-time

job 24824 Junior Drupal Developer Brazil

100% Remote Full-time

Experienced Lead DevOps Engineer – Kubernetes, Linux, and Cloud Computing Expertise for Remote Part-Time Customer Support

100% Remote Full-time

Hiring Now: Material Handler/ Box Truck Driver (Class B Required)

100% Remote Full-time

Experienced Full Stack Data Entry Specialist – Remote Opportunity with arenaflex

100% Remote Full-time

Group Leader Field Service UK

100% Remote Full-time