Okta12.03.2026
Staff Site Reliability Engineer - Observability GCP
Bellevue
Обязанности
- 01Design, build, and maintain scalable observability infrastructure using tools like Terraform
- 02Optimize the collection, processing, and storage of Observability data to ensure high reliability and low latency of our Splunk and Grafana services
- 03Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and observability-driven development
- 04Eliminate toil by automating the deployment and scaling of observability agents and collectors
Требования
- 01Minimum 5+ Experience scaling and managing observability in a Google Cloud platform
- 02Expertise in creating intuitive, actionable Splunk or Grafana dashboards that correlate data across multiple sources
- 03Minimum 3+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems
- 04Strong coding skills in Python, Go for building internal tools and automating workflows
- 05Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/GKE)
- 06A data-driven approach to debugging complex, cross-service performance bottlenecks
- 07Ability to access federal environments and/or have access to protected federal data
- 08U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee)
Условия
- 01Annual base salary range: $194,000 — $267,000 USD for San Francisco Bay Area
- 02Equity, bonus, and benefits including health, dental and vision insurance
- 03401(k), flexible spending account
- 04Paid leave including PTO and parental leave
- 05Hybrid work model