Okta24.03.2026

Staff Site Reliability Engineer - Kubernetes

Bellevue

Обязанности

01Design, implement, and maintain highly available, scalable, and fault-tolerant Kubernetes platforms
02Build, manage, and optimize AWS cloud infrastructure, including EKS, ECS, S3, VPCs, RDS, IAM, and more
03Utilize Helm to automate and streamline the deployment of applications and services to Kubernetes clusters
04Implement and manage Karpenter to dynamically scale Kubernetes clusters in response to workload demands
05Configure and manage Istio to provide service-to-service communication, security, and observability within the Kubernetes clusters
06Automate the deployment, scaling, and management of infrastructure and applications
07Respond to incidents, troubleshoot, and resolve system issues related to performance, availability, and security
08Design and implement secure cloud infrastructure with appropriate access controls, network security, and compliance frameworks
09Create and maintain detailed documentation for Kubernetes platform setup, operational procedures, and best practices

Требования

014+ years of experience with Kubernetes/Helm
024+ years of Experience with Terraform
035+ years of Experience with AWS
04Experience with multi-region cloud environments
05Proven experience with AWS (EC2, RDS, S3, CloudFormation, IAM, etc.) and solid understanding of cloud-native architectures
06Strong expertise in Kubernetes platform creation, management, and optimisation (e.g., setting up highly available clusters, networking, and storage)
07Hands-on experience with Helm for Kubernetes application deployment and management
08Practical experience with Karpenter for dynamic scaling of Kubernetes clusters and optimising resource usage
09Expertise in managing and securing Istio for service mesh, including traffic management, security, and observability features
10Proficiency in CI/CD pipelines and automation tools (e.g., Jenkins, GitLab, CircleCI, Terraform, Ansible, Spinnaker)
11Strong scripting and automation skills in Python, Bash, or Go for infrastructure management and platform automation
12Experience with monitoring, logging, and alerting tools such as Prometheus, Grafana, CloudWatch, and ELK Stack
13Understanding of security best practices for cloud platforms and Kubernetes (e.g., role-based access control (RBAC), encryption, and compliance frameworks)
14Familiarity with Docker and containerization principles
15Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent professional experience)
16This position requires the ability to access federal environments and/or have access to protected federal data
17The successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire
18Requires in-person onboarding and travel to our San Francisco, CA HQ office or our Chicago office during the first week of employment

Условия

01Annual base salary range for candidates in the San Francisco Bay area is between: $194,000 — $267,000 USD
02Hybrid work model (#LI-Hybrid)
03Requires in-person onboarding and travel to San Francisco, CA HQ or Chicago office during the first week

Staff Site Reliability Engineer - Kubernetes

Обязанности

Требования

Условия

Похожие вакансии

Staff Site Reliability Engineer - Observability GCP

Staff Site Reliability Engineer - Observability

Staff Network Engineer (FedRamp)

Staff Software Engineer, Security Engineering

Manager- Site Reliability Engineering

Senior Platform Engineer, Runtime (Auth0)