Cognition09.05.2026
Site Reliability Engineer
Полная занятостьОфис
Обязанности
- 01Define and own SLOs, SLIs, and error budgets for Devin and Windsurf
- 02Build monitoring, alerting, and observability systems for service health
- 03Lead incident response and run blameless postmortems
- 04Build runbooks and tooling for on-call
- 05Own deployment pipelines, release infrastructure, and internal developer tooling
- 06Manage cloud infrastructure through code
- 07Build reproducible, version-controlled environments
- 08Model growth, forecast resource needs, and ensure infrastructure scales
- 09Profile and improve system performance
- 10Ensure security misconfigurations and vulnerabilities are caught and remediated
- 11Partner with product and engineering teams to build reliability from the start
Требования
- 01Deep experience running production systems at scale
- 02Strong software engineering fundamentals
- 03Proficiency with cloud infrastructure (AWS, GCP, or Azure)
- 04Experience with container orchestration (Kubernetes)
- 05Experience with infrastructure as code (Terraform or equivalent)
- 06Experience building and owning CI/CD pipelines
- 07Strong observability instincts
- 08Track record of reducing toil through automation
- 09Comfort owning incidents end to end
- 10Product empathy to understand reliability from user perspective
- 11Experience with developer-facing products or platforms is a plus
Условия
- 01Base Salary: $260,000 - $300,000 + significant early-stage equity
- 02Medical, Dental, Vision: Fully paid for you and your dependents
- 03401(k): Company match included
- 04Perks: Private chef, cozy slippers, endless snacks, and more
- 05Small, highly selective team
- 06High ownership and high trust environment
- 07Remote work flexibility