Samsara27.03.2026

Senior Software Engineer II, Developer Experience / Operational Excellence

Remote - UK

Обязанности

  • 01Design and build automated reliability and self-healing systems that protect production at scale, including automated rollbacks, deploy safeguards, and fault mitigation, and deliver them as platform tooling that engineering teams across the company adopt for their own services
  • 02Own and improve incident management tooling and on-call health. Reduce alert noise, surface actionable signals, and empower engineering teams to operate their services confidently with minimal operational burden
  • 03Develop and evolve observability infrastructure, including monitoring, alerting, SLOs, and performance regression detection, to give teams real-time, actionable visibility into system health and latency
  • 04Contribute to AI-driven operational tooling that goes beyond triage, building toward autonomous remediation where AI detects issues, takes corrective action, and self-recovers with minimal human involvement
  • 05Drive incident prevention by identifying systemic patterns and ruthlessly eliminating operational toil
  • 06Partner directly with product engineering teams to diagnose reliability gaps, reduce their operational burden, and help them adopt best practices for running their services
  • 07Define and champion operational excellence best practices across engineering through guardrails, scorecards, and standards that help teams run their services reliably by default
  • 08Champion, role model, and embed Samsara’s cultural principles (Focus on Customer Success, Build for the Long Term, Adopt a Growth Mindset, Be Inclusive, Win as a Team) as we scale globally and across new offices

Требования

  • 018+ years of experience designing and building products in a software engineering team
  • 02Bachelor's Degree in Computer Science/Engineering or equivalent practical experience
  • 033+ years of experience on infrastructure and/or platform engineering focused teams
  • 04Expertise in Observability and reliability, operational metrics and data analysis
  • 05Proven track record architecting monitoring frameworks, SLO platforms, and automated response workflows
  • 06Experience with Datadog (or equivalent observability tooling like New Relic, Grafana)
  • 07Proven experience working on large-scale enterprise software applications
  • 08Experience in Developer Experience (DevEx) & Internal Portals: Designing and implementing solutions/tools that centralise and simplify engineering operations