Synthesia16 дней назад

Senior Site Reliability Engineer

Полная занятостьУдалёнка

Обязанности

01Incident management & operational excellence — take custody of the incident process: on-call quality, response, post-mortems, and driving down incident count, time-to-detect, and time-to-resolve
02Automation & reliability engineering — automate low-frequency, high-consequence operations (the certificate-renewal class of problem — rare, easy to forget, outage-causing when missed), not just the high-frequency toil
03A platform domain — over time, deep ownership of a domain such as Temporal, observability, or Kubernetes operations, partnering with the engineers building in it
04Vendor & third-party management — own key external relationships and integrations (e.g. LLM API providers, third-party services), today managed manually and informally
05FinOps — own cloud and platform cost visibility and efficiency, and the mechanics of how usage maps to billing

01Strong production operations experience on AWS and Kubernetes
02Comfortable with MongoDB and scripting/automation in Python
03An operations-and-reliability mindset — you take pride in systems that run quietly
04Sound judgement on incidents and risk; calm and clear under pressure
05Influences through relationships and evidence, not escalation; comfortable owning a domain and partnering across teams