Senior AI Safety Clinical Reviewer (TPM)

mpathic

Full-time

Remote

United States

Nurse Reviewer

Senior AI Safety Clinical Reviewer (TPM)

Clinical QA Lead, AI Safety Evaluations

Our Story

mpathic is building the future of empathetic, trustworthy AI. Grounded in behavioral science and human-centered design, our technology delivers AI systems that are safe, aligned, and emotionally intelligent. As enterprises race to adopt AI, we believe the companies that win will be those that build trust first.

We are building a high-quality AI Safety Team to evaluate and strengthen advanced AI systems. Our work focuses on making models reliable, auditable, and scalable—so safety work can move fast without relying on heroics or sacrificing quality.

Position Overview

We’re looking for a Clinical Reviewer & Trainer who is a licensed or clinically trained expert, with deep experience in psychological safety, qualitative evaluation, and clinical supervision or training. This role owns clinical QA systems, not just participates in them, making final calls on clinical quality within defined guardrails. The ideal hire will manage, mentor, and set standards for clinicians who are completing diverse tasks that range from data generation, benchmarking, evaluation, developing safety rubrics and red teaming. This role is ~50% clinical QA and adjudication, ~50% expert training and calibration.

This role does not provide therapy, crisis intervention, or on-call clinical care.

In this role, you will serve as the clinical quality backbone of our evaluation programs—ensuring that expert raters apply rubrics consistently, edge cases are adjudicated rigorously, and our outputs reflect high-quality clinical judgment in emotionally complex AI interactions.

You’ll operate at the intersection of:

Clinical judgment and supervision
AI safety evaluation rigor
Expert training and calibration
Rubric interpretation and edge-case adjudication
Qualitative data review and synthesis
Scalable QA systems

This is a full-time, remote role (US preferred), reporting to the Head of AI Safety or Evaluation Programs/GM.

What You’ll Accomplish

In your first 60–90 days you’ll…

Establish and evolve a reliable clinical QA and adjudication workflow across evaluation projects, identifying gaps and improving on systems
Train and onboard expert raters on safety rubrics, shared standards, and evaluation philosophy
Run calibration sessions to reduce disagreement and improve consistency
Serve as the escalation point for ambiguous or high-risk conversations
Deliver adjudicated, high-confidence evaluation datasets for an initial pilot (e.g., stressful life events conversations)
Demonstrate calibrated clinical judgment aligned with mpathic’s safety philosophy
Design workflows that others can run without constant oversight

In your first year, you’ll…

Own clinical quality systems across multiple concurrent AI safety evaluation programs
Develop scalable training, certification, and feedback loops for expert evaluators
Continuously improve inter-rater reliability, rubric clarity, and reviewer consistency
Build gold-standard examples and clinical playbooks for sensitive user contexts
Identify recurring model failure patterns and inform rubric/tooling improvements, delivering outputs that can withstand customer, partner, and/or audit scrutiny
Help shape mpathic’s long-term approach to emotionally grounded AI safety QA
Balance clinical nuance, operational clarity, and customer needs in real-world evaluation delivery

You’ll Thrive in This Role If You…

Have 5+ years of experience in one or more of:

Clinical supervision, training, or peer review
Clinical research operations or qualitative review
Trust & safety, human-centered AI, or behavioral science workflows
AI evaluation, annotation QA, or rubric-based assessment systems
Coaching clinicians or experts to apply shared standards in high-stakes domains

And you bring clinical expertise such as:

Licensed clinician (LCSW, LMFT, PsyD, PhD, MD) or equivalent applied experience
Deep familiarity with psychological stress, trauma, crisis response, or mental health frameworks
Expertise evaluating emotionally complex conversations for safety, appropriateness, and harm risk
Strong judgment in ambiguous edge cases involving distress, vulnerability, or sensitive disclosures

You are skilled at:

Synthesizing qualitative review data across large datasets to identify recurring patterns, failure modes, and sources of rater disagreement, and translating those insights into improved rubrics, training materials, and QA workflows
Training experts to apply nuanced rubrics consistently
Running calibration sessions and reducing disagreement across raters
Making and documenting difficult adjudication calls with rigor
Maintaining clinical quality without slowing execution
Updating guidance as edge cases and model behaviors evolve
Communicating clearly through structured feedback, examples, and reviewer notes
Helping other clinicians apply shared evaluation standards

What You’ll Do

Own Clinical QA & Adjudication (Core)

Review expert ratings for clinical appropriateness, consistency, and safety alignment
Serve as the escalation point for edge cases, ambiguous conversations, or disagreement clusters
Make final adjudication decisions and document rationale clearly
Ensure evaluation outputs are rigorous, clinically grounded, and customer-ready
Evaluate safety, appropriateness, and response quality—not clinical outcomes
Maintain responsible scope and handling of sensitive psychological content

Train & Enable Expert Evaluators

Onboard and certify new expert raters on mpathic rubrics and evaluation philosophy
Develop training materials, examples, and calibration exercises
Coach clinicians and evaluators to engage in high-level qualitative analysis
Run ongoing learning loops to prevent rater drift over time

Drive Calibration & Quality Systems

Implement review tiers: peer review → clinical QA → escalation/adjudication
Run inter-rater reliability measurement and disagreement reduction workflows
Maintain gold sets, anchor examples, and severity calibration standards
Identify patterns of confusion and update rubric guidance accordingly

Strengthen Rubrics Through Clinical Feedback

Partner with behavioral science experts to refine rating criteria and edge-case handling
Surface recurring rubric gaps or unclear definitions
Ensure rubrics balance clinical nuance with operational usability

Cross-Functional Collaboration

Work closely with:

TPMs and Evaluation Leads — delivery execution, workflows, escalation systems
Clinical & Behavioral Science Experts — rubric grounding, psychological frameworks
QA Leadership — agreement metrics, gold sets, drift monitoring
Engineering / Product — tooling support for review, audit trails, and escalation queues
Customer Delivery — ensuring findings are interpretable and trustworthy

About the Team

You’ll collaborate closely with:

Clinical & Behavioral Science Experts — rubric design, psychological grounding
QA / Evaluation Leadership — calibration, review systems, drift monitoring
Engineering / Product — tooling, automation, evaluation pipelines
Customer Delivery — scoping, results, renewals

We value calm execution, clinical rigor, operational excellence, and scalable systems over fragile heroics.

Compensation & Benefits

Base Salary (US): $140,000–$190,000 (band depends on clinical seniority, scope, and number of evaluation programs supported)
Equity: Yes
Benefits: We offer 100% company-funded health, dental, and vision insurance for full-time employees. Additionally, we offer 401k, well-being programs, and flexible paid-time off.
Remote-first
Mission-driven work focused on AI safety, trust, and operational rigor

Apply Even If You Don’t Check Every Box

If you’re excited about bringing clinical judgment, training excellence, and quality systems into AI safety evaluation work—and want to help ensure emotionally grounded AI systems are safe and trustworthy—we’d love to hear from you.

Apply now

Senior AI Safety Clinical Reviewer (TPM)

More jobs

System Surgical Clinical Reviewer

Shriners Children's

Lead Clinical Reviewer

CareCentrix