Okta24.03.2026

Staff Site Reliability Engineer - Kubernetes

Bellevue

Обязанности

  • 01Design, implement, and maintain highly available, scalable, and fault-tolerant Kubernetes platforms
  • 02Build, manage, and optimize AWS cloud infrastructure, including EKS, ECS, S3, VPCs, RDS, IAM, and more
  • 03Utilize Helm to automate and streamline the deployment of applications and services to Kubernetes clusters
  • 04Implement and manage Karpenter to dynamically scale Kubernetes clusters in response to workload demands
  • 05Configure and manage Istio to provide service-to-service communication, security, and observability within the Kubernetes clusters
  • 06Automate the deployment, scaling, and management of infrastructure and applications
  • 07Respond to incidents, troubleshoot, and resolve system issues related to performance, availability, and security
  • 08Design and implement secure cloud infrastructure with appropriate access controls, network security, and compliance frameworks
  • 09Create and maintain detailed documentation for Kubernetes platform setup, operational procedures, and best practices

Требования

  • 014+ years of experience with Kubernetes/Helm
  • 024+ years of Experience with Terraform
  • 035+ years of Experience with AWS
  • 04Experience with multi-region cloud environments
  • 05Proven experience with AWS (EC2, RDS, S3, CloudFormation, IAM, etc.) and solid understanding of cloud-native architectures
  • 06Strong expertise in Kubernetes platform creation, management, and optimisation (e.g., setting up highly available clusters, networking, and storage)
  • 07Hands-on experience with Helm for Kubernetes application deployment and management
  • 08Practical experience with Karpenter for dynamic scaling of Kubernetes clusters and optimising resource usage
  • 09Expertise in managing and securing Istio for service mesh, including traffic management, security, and observability features
  • 10Proficiency in CI/CD pipelines and automation tools (e.g., Jenkins, GitLab, CircleCI, Terraform, Ansible, Spinnaker)
  • 11Strong scripting and automation skills in Python, Bash, or Go for infrastructure management and platform automation
  • 12Experience with monitoring, logging, and alerting tools such as Prometheus, Grafana, CloudWatch, and ELK Stack
  • 13Understanding of security best practices for cloud platforms and Kubernetes (e.g., role-based access control (RBAC), encryption, and compliance frameworks)
  • 14Familiarity with Docker and containerization principles
  • 15Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent professional experience)
  • 16This position requires the ability to access federal environments and/or have access to protected federal data
  • 17The successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire
  • 18Requires in-person onboarding and travel to our San Francisco, CA HQ office or our Chicago office during the first week of employment

Условия

  • 01Annual base salary range for candidates in the San Francisco Bay area is between: $194,000 — $267,000 USD
  • 02Hybrid work model (#LI-Hybrid)
  • 03Requires in-person onboarding and travel to San Francisco, CA HQ or Chicago office during the first week