Cognition09.05.2026

Site Reliability Engineer

Полная занятостьОфис

Обязанности

  • 01Define and own SLOs, SLIs, and error budgets for Devin and Windsurf
  • 02Build monitoring, alerting, and observability systems for service health
  • 03Lead incident response and run blameless postmortems
  • 04Build runbooks and tooling for on-call
  • 05Own deployment pipelines, release infrastructure, and internal developer tooling
  • 06Manage cloud infrastructure through code
  • 07Build reproducible, version-controlled environments
  • 08Model growth, forecast resource needs, and ensure infrastructure scales
  • 09Profile and improve system performance
  • 10Ensure security misconfigurations and vulnerabilities are caught and remediated
  • 11Partner with product and engineering teams to build reliability from the start

Требования

  • 01Deep experience running production systems at scale
  • 02Strong software engineering fundamentals
  • 03Proficiency with cloud infrastructure (AWS, GCP, or Azure)
  • 04Experience with container orchestration (Kubernetes)
  • 05Experience with infrastructure as code (Terraform or equivalent)
  • 06Experience building and owning CI/CD pipelines
  • 07Strong observability instincts
  • 08Track record of reducing toil through automation
  • 09Comfort owning incidents end to end
  • 10Product empathy to understand reliability from user perspective
  • 11Experience with developer-facing products or platforms is a plus

Условия

  • 01Base Salary: $260,000 - $300,000 + significant early-stage equity
  • 02Medical, Dental, Vision: Fully paid for you and your dependents
  • 03401(k): Company match included
  • 04Perks: Private chef, cozy slippers, endless snacks, and more
  • 05Small, highly selective team
  • 06High ownership and high trust environment
  • 07Remote work flexibility