Reka14.05.2026

Member of Technical Staff (Data Intelligence)

Полная занятостьУдалёнка

Обязанности

  • 01Work with model researchers to define what “good data” means for our models, including quality metrics, validation checks, and acceptance thresholds
  • 02Explore open source datasets and create internal ones most suitable to build fundamental World Models
  • 03Build algorithms for automated data quality assessment, data domain mixtures, and domain adaptation from synthetic to real data.
  • 04Track datasets, metadata, provenance, and versions so experiments are reproducible and it’s clear what data went into which training and evaluation runs
  • 05Own CI/CD and development tooling for the data stack (GitHub, Python, PyTorch), and automate repetitive workflows to reduce friction
  • 06Track and optimize throughput, storage, and compute utilization across pipelines and related assets

Требования

  • 01Strong ML and deep learning fundamentals with experience building and operating large-scale data and/or compute systems
  • 02Comfortable moving between research questions and production engineering: you can dig into data, run analyses, and also ship reliable systems
  • 03Demonstrated research experience with data compositions, quality, and dataset releases
  • 04Ability to design and execute experiments with convincing unbiased outcomes
  • 05Practical experience with distributed processing and orchestration (Spark, Ray, Airflow, or equivalents)
  • 06Solid Python skills, and familiarity with the tooling around modern model training workflows (datasets, checkpoints, experiment tracking)
  • 07Strong instincts around data quality: how to measure it, how to monitor it, and how to prevent regressions as things scale
  • 08Able to work in a fast-moving environment, prioritize what matters, and communicate clearly with both researchers and engineers
  • 09Bonus: experience with large video datasets, dataset curation for training, or building internal tooling for evaluation/analysis in ML environments