Synthesia16 дней назад
Senior Site Reliability Engineer
Полная занятостьУдалёнка
Обязанности
- 01Incident management & operational excellence — take custody of the incident process: on-call quality, response, post-mortems, and driving down incident count, time-to-detect, and time-to-resolve
- 02Automation & reliability engineering — automate low-frequency, high-consequence operations (the certificate-renewal class of problem — rare, easy to forget, outage-causing when missed), not just the high-frequency toil
- 03A platform domain — over time, deep ownership of a domain such as Temporal, observability, or Kubernetes operations, partnering with the engineers building in it
- 04Vendor & third-party management — own key external relationships and integrations (e.g. LLM API providers, third-party services), today managed manually and informally
- 05FinOps — own cloud and platform cost visibility and efficiency, and the mechanics of how usage maps to billing
Требования
- 01Strong production operations experience on AWS and Kubernetes
- 02Comfortable with MongoDB and scripting/automation in Python
- 03An operations-and-reliability mindset — you take pride in systems that run quietly
- 04Sound judgement on incidents and risk; calm and clear under pressure
- 05Influences through relationships and evidence, not escalation; comfortable owning a domain and partnering across teams
Условия
- 01Remote (US East Coast preferred, for timezone coverage)