Anthropic06.05.2026
Senior Staff+ Software Engineer, Kubernetes Platform
San Francisco
Обязанности
- 01Own, operate, and extend the Kubernetes scheduler for Anthropic's accelerator fleets, including custom scheduling plugins and policies for gang scheduling, topology awareness, and preemption
- 02Scale the Kubernetes control plane (apiserver, etcd, controller-manager) to support clusters far beyond typical limits, and find the next bottleneck before it finds us
- 03Design, build, and operate core cluster services such as service discovery that every workload in the fleet depends on
- 04Build and maintain custom controllers, operators, and CRDs
- 05Partner with research, training, and inference to understand workload shapes and turn their requirements into platform capabilities
- 06Collaborate with cloud providers on required features and escalations
- 07Participate in on-call, lead incident response, and design processes (postmortems, runbooks, SLOs) that help the team avoid repeating failures
Требования
- 01Significant software engineering experience building and operating production distributed systems
- 02Proficiency in at least one systems-appropriate language (e.g., Go, Python, Rust, or C++)
- 03Deep, hands-on Kubernetes experience (well beyond "user of") into scheduler, controllers, apiserver, or operating large multi-tenant clusters
- 04Demonstrated ability to debug complex issues across the stack, from API behavior down to node and network-level root causes
- 05A track record of designing for reliability, correctness, and clear failure semantics in systems other engineers depend on
- 06Strong written and verbal communication; comfort building consensus with internal stakeholders
- 07Experience with Kubernetes internals or contributions: kube-scheduler / scheduling framework, apiserver, etcd, client-go, controller-runtime, or similar
- 08Experience building or operating cluster schedulers or batch systems (e.g., Kueue, Volcano, Slurm, or in-house equivalents)
- 09Background scaling control planes or coordination systems (etcd, ZooKeeper, Consul, or large DNS/service-mesh deployments)
- 10Familiarity with ML infrastructure: GPUs, TPUs, or Trainium; gang scheduling; topology-aware placement; collective networking such as NCCL
- 11Experience with GCP and/or AWS, including GKE/EKS internals and Infrastructure as Code
- 12Low-level systems experience such as Linux kernel tuning, cgroups, or eBPF
- 138+ years of relevant industry experience, including time leading large, ambiguous infrastructure projects
Условия
- 01Annual Salary: $320,000 — $405,000 USD
- 02Minimum education: Bachelor’s degree or an equivalent combination of education, training, and/or experience
- 03Required field of study: A field relevant to the role as demonstrated through coursework, training, or professional experience
- 04Location-based hybrid policy: office at least 25% of the time
- 05Visa sponsorship: available (with reasonable efforts to obtain visa)