JetBrains02.04.2026

Senior Research Engineer (Agentic Behavior)

Amsterdam

Обязанности

  • 01Build tools for agentic error analysis
  • 02Design and implement tooling to systematically capture, classify, and analyse errors that AI coding agents make when generating Kotlin code
  • 03Build observability pipelines over agentic traces – mining patterns from agent sessions in JetBrains IDEs, Junie, Claude Code, Cursor, and other coding agents
  • 04Build evaluation pipelines
  • 05Design, implement, and maintain evaluation pipelines that measure Kotlin code generation quality across dimensions, including correctness, idiomaticity, build success, framework usage, and test coverage
  • 06Build simulation environments where coding agents can be measured on realistic Kotlin developer tasks – from greenfield KMP projects and Gradle dependency management to migrating Spring applications from Java to Kotlin
  • 07Own evaluation infrastructure: metrics, experiment tracking, automated regression checks, and reproducible benchmarking
  • 08Research methods for improving agent and model behavior on Kotlin
  • 09Experiment with post-training techniques (SFT, DPO, GRPO) to improve how models handle Kotlin-specific patterns, idioms, and frameworks
  • 10Investigate context engineering approaches: CLAUDE.md/AGENTS.md files, compiler-as-verifier feedback loops, Kotlin LSP integration, and MCP-based tooling
  • 11Run experiments to measure impact: A/B comparisons, benchmark suites, and before/after analyses on real codebases
  • 12Collaborate with model providers (Anthropic, OpenAI, and Google) to translate Kotlin-specific findings into model improvements
  • 13Build public Kotlin benchmarks
  • 14Design and build open-source benchmarks that measure AI coding agent performance on Kotlin tasks and eventually become the standard reference for the ecosystem
  • 15Create task datasets covering the breadth of Kotlin usage: the server side (Spring, Ktor), multiplatform projects (KMP), build systems (Gradle), Android, library development, and others
  • 16Include both mined real-world tasks and carefully designed synthetic tasks that test specific Kotlin capabilities
  • 17Maintain and evolve benchmarks as models improve, ensuring they remain challenging, relevant, and contamination-resistant

Требования

  • 01Hands-on experience building evaluation or analysis pipelines for LLMs or AI coding agents in a research or production setting
  • 02Strong Python engineering skills (at least three years), with the ability to write clean, maintainable code in data-heavy and ML-adjacent codebases
  • 03Experience with data analysis at scale: querying large datasets (SQL/Athena), building data pipelines, and performing statistical analysis of experimental results
  • 04The ability to own projects end to end – from identifying a problem in agent traces to designing an eval, running experiments, and shipping a fix
  • 05A product-aware mindset: You care about how agents are actually used by developers and can translate real failure modes into evaluation and training work
  • 06Familiarity with Kotlin or a strong willingness to develop deep Kotlin expertise (you'll be living in Kotlin codebases daily)

Условия

  • 01Strong base salary
  • 02Flexible work location
  • 03Remote work
  • 04Extra time off
  • 05Medical insurance allowance
  • 06Learning and development opportunities
  • 07Relocation support
  • 08Language classes
  • 09Fuel your day
  • 10Mental health support
  • 11Sports benefit
  • 12Internal events