Cursor13.04.2026
Software Engineer, Agent Evaluation and Quality
Полная занятостьОфис
Обязанности
- 01Designing and building best-in-class AI evaluation system: curated datasets, offline replay, scorers / judges, regression alerts, and dashboards
- 02Designing feedback loops from real usage: collecting, cleaning, and interpreting user signals to inform model and harness changes
- 03Developing analysis tooling and workflows for debugging agent behavior: deep dives on failure modes, clustering themes, and surfacing actionable insights
- 04Improving reliability and guardrails by making quality measurable and operational: defining “good/bad/degraded” sessions, alerting, and triage primitives
Требования
- 01You’ve built and operated evaluation or measurement systems, such as AI evals, experimentation, ranking/relevance, or search quality. You can turn ambiguous “quality” questions into concrete metrics, pipelines, and decisions
- 02You have strong data acumen, and can collaborate effectively with data scientists and researchers
- 03You have taste and strong opinions on model and agent behaviors. You stay up-to-date and informed on emerging research and industry trends
- 04You have strong software engineering fundamentals and enjoy shipping production systems