JetBrains02.04.2026
Senior Research Engineer (Agentic Behavior)
Amsterdam
Обязанности
- 01Build tools for agentic error analysis
- 02Design and implement tooling to systematically capture, classify, and analyse errors that AI coding agents make when generating Kotlin code
- 03Build observability pipelines over agentic traces – mining patterns from agent sessions in JetBrains IDEs, Junie, Claude Code, Cursor, and other coding agents
- 04Build evaluation pipelines
- 05Design, implement, and maintain evaluation pipelines that measure Kotlin code generation quality across dimensions, including correctness, idiomaticity, build success, framework usage, and test coverage
- 06Build simulation environments where coding agents can be measured on realistic Kotlin developer tasks – from greenfield KMP projects and Gradle dependency management to migrating Spring applications from Java to Kotlin
- 07Own evaluation infrastructure: metrics, experiment tracking, automated regression checks, and reproducible benchmarking
- 08Research methods for improving agent and model behavior on Kotlin
- 09Experiment with post-training techniques (SFT, DPO, GRPO) to improve how models handle Kotlin-specific patterns, idioms, and frameworks
- 10Investigate context engineering approaches: CLAUDE.md/AGENTS.md files, compiler-as-verifier feedback loops, Kotlin LSP integration, and MCP-based tooling
- 11Run experiments to measure impact: A/B comparisons, benchmark suites, and before/after analyses on real codebases
- 12Collaborate with model providers (Anthropic, OpenAI, and Google) to translate Kotlin-specific findings into model improvements
- 13Build public Kotlin benchmarks
- 14Design and build open-source benchmarks that measure AI coding agent performance on Kotlin tasks and eventually become the standard reference for the ecosystem
- 15Create task datasets covering the breadth of Kotlin usage: the server side (Spring, Ktor), multiplatform projects (KMP), build systems (Gradle), Android, library development, and others
- 16Include both mined real-world tasks and carefully designed synthetic tasks that test specific Kotlin capabilities
- 17Maintain and evolve benchmarks as models improve, ensuring they remain challenging, relevant, and contamination-resistant
Требования
- 01Hands-on experience building evaluation or analysis pipelines for LLMs or AI coding agents in a research or production setting
- 02Strong Python engineering skills (at least three years), with the ability to write clean, maintainable code in data-heavy and ML-adjacent codebases
- 03Experience with data analysis at scale: querying large datasets (SQL/Athena), building data pipelines, and performing statistical analysis of experimental results
- 04The ability to own projects end to end – from identifying a problem in agent traces to designing an eval, running experiments, and shipping a fix
- 05A product-aware mindset: You care about how agents are actually used by developers and can translate real failure modes into evaluation and training work
- 06Familiarity with Kotlin or a strong willingness to develop deep Kotlin expertise (you'll be living in Kotlin codebases daily)
Условия
- 01Strong base salary
- 02Flexible work location
- 03Remote work
- 04Extra time off
- 05Medical insurance allowance
- 06Learning and development opportunities
- 07Relocation support
- 08Language classes
- 09Fuel your day
- 10Mental health support
- 11Sports benefit
- 12Internal events