Okta12.03.2026

Staff Site Reliability Engineer - Observability GCP

Bellevue

Обязанности

  • 01Design, build, and maintain scalable observability infrastructure using tools like Terraform
  • 02Optimize the collection, processing, and storage of Observability data to ensure high reliability and low latency of our Splunk and Grafana services
  • 03Participate in on-call rotations and lead post-incident reviews to drive systemic improvements and observability-driven development
  • 04Eliminate toil by automating the deployment and scaling of observability agents and collectors

Требования

  • 01Minimum 5+ Experience scaling and managing observability in a Google Cloud platform
  • 02Expertise in creating intuitive, actionable Splunk or Grafana dashboards that correlate data across multiple sources
  • 03Minimum 3+ years of experience in an SRE, DevOps, or Systems Engineering role with a focus on high-availability systems
  • 04Strong coding skills in Python, Go for building internal tools and automating workflows
  • 05Deep understanding of Linux internals, networking (TCP/IP, DNS, Load Balancing), and container orchestration (Kubernetes/GKE)
  • 06A data-driven approach to debugging complex, cross-service performance bottlenecks
  • 07Ability to access federal environments and/or have access to protected federal data
  • 08U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee)

Условия

  • 01Annual base salary range: $194,000 — $267,000 USD for San Francisco Bay Area
  • 02Equity, bonus, and benefits including health, dental and vision insurance
  • 03401(k), flexible spending account
  • 04Paid leave including PTO and parental leave
  • 05Hybrid work model