Cursor13.04.2026

Software Engineer, Agent Evaluation and Quality

Полная занятостьОфис

Обязанности

  • 01Designing and building best-in-class AI evaluation system: curated datasets, offline replay, scorers / judges, regression alerts, and dashboards
  • 02Designing feedback loops from real usage: collecting, cleaning, and interpreting user signals to inform model and harness changes
  • 03Developing analysis tooling and workflows for debugging agent behavior: deep dives on failure modes, clustering themes, and surfacing actionable insights
  • 04Improving reliability and guardrails by making quality measurable and operational: defining “good/bad/degraded” sessions, alerting, and triage primitives

Требования

  • 01You’ve built and operated evaluation or measurement systems, such as AI evals, experimentation, ranking/relevance, or search quality. You can turn ambiguous “quality” questions into concrete metrics, pipelines, and decisions
  • 02You have strong data acumen, and can collaborate effectively with data scientists and researchers
  • 03You have taste and strong opinions on model and agent behaviors. You stay up-to-date and informed on emerging research and industry trends
  • 04You have strong software engineering fundamentals and enjoy shipping production systems