Okta24.03.2026
Staff Site Reliability Engineer - Kubernetes
Bellevue
Обязанности
- 01Design, implement, and maintain highly available, scalable, and fault-tolerant Kubernetes platforms
- 02Build, manage, and optimize AWS cloud infrastructure, including EKS, ECS, S3, VPCs, RDS, IAM, and more
- 03Utilize Helm to automate and streamline the deployment of applications and services to Kubernetes clusters
- 04Implement and manage Karpenter to dynamically scale Kubernetes clusters in response to workload demands
- 05Configure and manage Istio to provide service-to-service communication, security, and observability within the Kubernetes clusters
- 06Automate the deployment, scaling, and management of infrastructure and applications
- 07Respond to incidents, troubleshoot, and resolve system issues related to performance, availability, and security
- 08Design and implement secure cloud infrastructure with appropriate access controls, network security, and compliance frameworks
- 09Create and maintain detailed documentation for Kubernetes platform setup, operational procedures, and best practices
Требования
- 014+ years of experience with Kubernetes/Helm
- 024+ years of Experience with Terraform
- 035+ years of Experience with AWS
- 04Experience with multi-region cloud environments
- 05Proven experience with AWS (EC2, RDS, S3, CloudFormation, IAM, etc.) and solid understanding of cloud-native architectures
- 06Strong expertise in Kubernetes platform creation, management, and optimisation (e.g., setting up highly available clusters, networking, and storage)
- 07Hands-on experience with Helm for Kubernetes application deployment and management
- 08Practical experience with Karpenter for dynamic scaling of Kubernetes clusters and optimising resource usage
- 09Expertise in managing and securing Istio for service mesh, including traffic management, security, and observability features
- 10Proficiency in CI/CD pipelines and automation tools (e.g., Jenkins, GitLab, CircleCI, Terraform, Ansible, Spinnaker)
- 11Strong scripting and automation skills in Python, Bash, or Go for infrastructure management and platform automation
- 12Experience with monitoring, logging, and alerting tools such as Prometheus, Grafana, CloudWatch, and ELK Stack
- 13Understanding of security best practices for cloud platforms and Kubernetes (e.g., role-based access control (RBAC), encryption, and compliance frameworks)
- 14Familiarity with Docker and containerization principles
- 15Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent professional experience)
- 16This position requires the ability to access federal environments and/or have access to protected federal data
- 17The successful candidate must be able to submit documentation establishing U.S. Person status (e.g. a U.S. Citizen, National, Lawful Permanent Resident, Refugee, or Asylee. 22 CFR 120.15) upon hire
- 18Requires in-person onboarding and travel to our San Francisco, CA HQ office or our Chicago office during the first week of employment
Условия
- 01Annual base salary range for candidates in the San Francisco Bay area is between: $194,000 — $267,000 USD
- 02Hybrid work model (#LI-Hybrid)
- 03Requires in-person onboarding and travel to San Francisco, CA HQ or Chicago office during the first week