Crusoe6 дней назад
Senior Manager, Infrastructure Platform Engineering
Полная занятостьОфис
Навыки
KubernetesGCPAWSAzurePrometheusOpenTelemetryGrafana
Обязанности
- 01Leading the team responsible for the platform services that abstract underlying infrastructure into reliable, allocatable capacity, and for the systems that track and reconcile state across a large fleet
- 02Setting the technical roadmap across capacity and utilization intelligence, resource lifecycle and state management, and platform security and trust frameworks
- 03Driving the design of secure, well-instrumented platform systems — from Kubernetes-based orchestration and automation to lower-level system and hardware integration
- 04Hiring, mentoring, and growing a team of infrastructure software engineers; building a high-performing organization from a strong foundation
- 05Partnering with infrastructure, production engineering, and security teams to align platform capabilities with operational reliability, capacity, and trust requirements
- 06Improving platform efficiency and availability — characterizing bottlenecks, reducing stranded resources, and shortening operational and recovery cycles
- 07Establishing engineering standards for infrastructure software development: code quality, testing, deployment safety, and on-call practices for systems that span the platform
- 08Translating a vertically integrated infrastructure stack into reliable platform primitives that engineering teams can build on
- 09Staying technically hands-on — reviewing designs, contributing to architecture decisions, and being credible to the engineers you lead
Требования
- 0110+ years of experience in infrastructure or systems software development, with at least 3+ years in an engineering leadership role
- 02Deep expertise in large-scale infrastructure platforms — building services that pool, allocate, and reconcile compute resources at scale
- 03Strong background with Kubernetes and cloud platforms (GCP, AWS, or Azure) — orchestration, automation, and operating distributed systems in production
- 04Experience with distributed state management and control systems — modeling resource and system lifecycle, reconciling desired vs. actual state, and handling failure gracefully across a large fleet
- 05Experience with efficiency, capacity, or performance engineering — characterizing system behavior, identifying bottlenecks, and driving measurable improvements in utilization or availability
- 06A player-coach approach to management: hands-on enough to make technical calls, structured enough to grow a team and ship through them
- 07Track record of hiring strong infrastructure engineers and helping them grow into more senior roles
- 08Comfortable operating in a fast-moving environment where the path isn't fully paved — willing to drive ambiguity to clarity