Reka14.05.2026
Member of Technical Staff (Data Intelligence)
Полная занятостьУдалёнка
Обязанности
- 01Work with model researchers to define what “good data” means for our models, including quality metrics, validation checks, and acceptance thresholds
- 02Explore open source datasets and create internal ones most suitable to build fundamental World Models
- 03Build algorithms for automated data quality assessment, data domain mixtures, and domain adaptation from synthetic to real data.
- 04Track datasets, metadata, provenance, and versions so experiments are reproducible and it’s clear what data went into which training and evaluation runs
- 05Own CI/CD and development tooling for the data stack (GitHub, Python, PyTorch), and automate repetitive workflows to reduce friction
- 06Track and optimize throughput, storage, and compute utilization across pipelines and related assets
Требования
- 01Strong ML and deep learning fundamentals with experience building and operating large-scale data and/or compute systems
- 02Comfortable moving between research questions and production engineering: you can dig into data, run analyses, and also ship reliable systems
- 03Demonstrated research experience with data compositions, quality, and dataset releases
- 04Ability to design and execute experiments with convincing unbiased outcomes
- 05Practical experience with distributed processing and orchestration (Spark, Ray, Airflow, or equivalents)
- 06Solid Python skills, and familiarity with the tooling around modern model training workflows (datasets, checkpoints, experiment tracking)
- 07Strong instincts around data quality: how to measure it, how to monitor it, and how to prevent regressions as things scale
- 08Able to work in a fast-moving environment, prioritize what matters, and communicate clearly with both researchers and engineers
- 09Bonus: experience with large video datasets, dataset curation for training, or building internal tooling for evaluation/analysis in ML environments