Samsara24.03.2026
Lead Machine Learning Engineer - ML Infrastructure
Remote - US
Обязанности
- 01Design, build, and operate Samsara’s end-to-end ML platform (training, experimentation, batch/online inference, edge) used by multiple Safety AI product teams
- 02Evolve shared training and experimentation infrastructure (orchestration, clusters, environments) and standardize tracking, evaluation, and regression testing
- 03Partner with product and applied ML teams to ship ML-powered features (CV models, EcoDriving insights, LLM-based reporting)
- 04Lead throughput and cost modeling for new ML features—from exploration to production-scale capacity planning
- 05Drive experiment design and evaluation, defining success metrics, structuring A/B or offline tests, and turning results into product and technical decisions
- 06Design and operate scalable online and batch inference systems (Ray, Spark), including deployment patterns, observability, SLOs, and unified training-to-production workflows
- 07Partner with firmware and edge teams to package, validate, and deploy models to Samsara devices, and build feedback loops from edge to cloud
- 08Own reliability, observability, and security for ML systems across cloud and edge, including on-call practices, incident response, and infrastructure hardening
- 09Own or co-own end-to-end technical delivery for high-priority or high-risk initiatives, from modeling and system design through production rollout
- 10Provide Staff+/Senior-Staff technical leadership on ML infrastructure architecture and strategy, influencing cross-team decisions and mentoring engineers and applied scientists
- 11Drive strong developer experience through documentation, office hours, and best practices, while contributing to and representing Samsara in open source communities (Ray, Spark, RayDP)
Требования
- 0110+ years of overall experience in machine learning engineering or related fields, with a strong track record of building and operating large-scale ML systems
- 02Strong experience with distributed computing frameworks such as Ray and/or Spark
- 03Hands-on experience with cloud infrastructure (AWS), containers/Kubernetes, and production observability tooling
- 04Proven experience building or supporting ML platforms (training, experimentation, or inference) used by multiple teams
- 05Solid understanding of ML fundamentals including evaluation, experiment design, and model iteration in production environments
- 06Experience shipping ML-powered features end-to-end, from design through production and iteration, with measurable impact on product or business metrics
- 07Background in computer vision and/or LLM-based systems in production environments
- 08Experience with edge or on-device ML and collaboration with firmware or embedded teams
Условия
- 01Remote position open to candidates based in the United States