Crusoe6 дней назад

Senior Manager, Infrastructure Platform Engineering

Полная занятостьОфис

Навыки

KubernetesGCPAWSAzurePrometheusOpenTelemetryGrafana

01Leading the team responsible for the platform services that abstract underlying infrastructure into reliable, allocatable capacity, and for the systems that track and reconcile state across a large fleet
02Setting the technical roadmap across capacity and utilization intelligence, resource lifecycle and state management, and platform security and trust frameworks
03Driving the design of secure, well-instrumented platform systems — from Kubernetes-based orchestration and automation to lower-level system and hardware integration
04Hiring, mentoring, and growing a team of infrastructure software engineers; building a high-performing organization from a strong foundation
05Partnering with infrastructure, production engineering, and security teams to align platform capabilities with operational reliability, capacity, and trust requirements
06Improving platform efficiency and availability — characterizing bottlenecks, reducing stranded resources, and shortening operational and recovery cycles
07Establishing engineering standards for infrastructure software development: code quality, testing, deployment safety, and on-call practices for systems that span the platform
08Translating a vertically integrated infrastructure stack into reliable platform primitives that engineering teams can build on
09Staying technically hands-on — reviewing designs, contributing to architecture decisions, and being credible to the engineers you lead

0110+ years of experience in infrastructure or systems software development, with at least 3+ years in an engineering leadership role
02Deep expertise in large-scale infrastructure platforms — building services that pool, allocate, and reconcile compute resources at scale
03Strong background with Kubernetes and cloud platforms (GCP, AWS, or Azure) — orchestration, automation, and operating distributed systems in production
04Experience with distributed state management and control systems — modeling resource and system lifecycle, reconciling desired vs. actual state, and handling failure gracefully across a large fleet
05Experience with efficiency, capacity, or performance engineering — characterizing system behavior, identifying bottlenecks, and driving measurable improvements in utilization or availability
06A player-coach approach to management: hands-on enough to make technical calls, structured enough to grow a team and ship through them
07Track record of hiring strong infrastructure engineers and helping them grow into more senior roles
08Comfortable operating in a fast-moving environment where the path isn't fully paved — willing to drive ambiguity to clarity