Crusoe6 дней назад

Senior Manager, Infrastructure Platform Engineering

Полная занятостьОфис

Навыки

KubernetesGCPAWSAzurePrometheusOpenTelemetryGrafana

Обязанности

  • 01Leading the team responsible for the platform services that abstract underlying infrastructure into reliable, allocatable capacity, and for the systems that track and reconcile state across a large fleet
  • 02Setting the technical roadmap across capacity and utilization intelligence, resource lifecycle and state management, and platform security and trust frameworks
  • 03Driving the design of secure, well-instrumented platform systems — from Kubernetes-based orchestration and automation to lower-level system and hardware integration
  • 04Hiring, mentoring, and growing a team of infrastructure software engineers; building a high-performing organization from a strong foundation
  • 05Partnering with infrastructure, production engineering, and security teams to align platform capabilities with operational reliability, capacity, and trust requirements
  • 06Improving platform efficiency and availability — characterizing bottlenecks, reducing stranded resources, and shortening operational and recovery cycles
  • 07Establishing engineering standards for infrastructure software development: code quality, testing, deployment safety, and on-call practices for systems that span the platform
  • 08Translating a vertically integrated infrastructure stack into reliable platform primitives that engineering teams can build on
  • 09Staying technically hands-on — reviewing designs, contributing to architecture decisions, and being credible to the engineers you lead

Требования

  • 0110+ years of experience in infrastructure or systems software development, with at least 3+ years in an engineering leadership role
  • 02Deep expertise in large-scale infrastructure platforms — building services that pool, allocate, and reconcile compute resources at scale
  • 03Strong background with Kubernetes and cloud platforms (GCP, AWS, or Azure) — orchestration, automation, and operating distributed systems in production
  • 04Experience with distributed state management and control systems — modeling resource and system lifecycle, reconciling desired vs. actual state, and handling failure gracefully across a large fleet
  • 05Experience with efficiency, capacity, or performance engineering — characterizing system behavior, identifying bottlenecks, and driving measurable improvements in utilization or availability
  • 06A player-coach approach to management: hands-on enough to make technical calls, structured enough to grow a team and ship through them
  • 07Track record of hiring strong infrastructure engineers and helping them grow into more senior roles
  • 08Comfortable operating in a fast-moving environment where the path isn't fully paved — willing to drive ambiguity to clarity