Crusoe28.04.2026

Senior Manager, Engineering

Полная занятостьОфис

Обязанности

  • 01Manage, coach, and grow a team of production engineers across shifts and time zones
  • 02Run structured 1:1s focused on career development, deliver candid performance feedback, and build a team culture grounded in ownership and continuous improvement
  • 03Partner with engineering leadership and recruiting to grow the team — owning the full hiring lifecycle from interview design to offer
  • 04Build and continuously improve onboarding and training programs that ramp new engineers quickly and effectively
  • 05Serve as an escalation point for high-severity incidents
  • 06Lead postmortems with a focus on systemic fixes, ensure action items are tracked and completed, and drive down MTTR over time
  • 07Define, monitor, and report on SLIs, SLOs, and SLAs across Crusoe's production systems
  • 08Oversee the design and maintenance of alerting and observability systems across bare-metal and cloud infrastructure, ensuring the team has the signal it needs to detect and respond to issues fast
  • 09Identify and prioritize opportunities to automate repetitive operational work, improving team efficiency and system resilience over time
  • 10Collaborate with infrastructure, platform engineering, product, and customer success teams to align on technical escalations, customer impact, and engineering priorities
  • 11Own the team's day-to-day operational rhythm — stand-ups, on-call rotations, incident reviews, and sprint planning — ensuring the team runs smoothly across time zones

Требования

  • 016+ years of experience managing 24/7 technical operations or SRE teams in cloud or data center environments, including demonstrated success developing senior engineers, building organizational capability, and improving operational outcomes at scale
  • 02Strong Linux and infrastructure fundamentals, including hands-on experience with containerization, Kubernetes, and virtualization in production environments
  • 03Observability and monitoring expertise, including experience with Prometheus, VictoriaMetrics, and custom exporters — ideally against bare-metal endpoints
  • 04Familiarity with messaging and workflow systems such as RabbitMQ, Kafka, NATS, or Temporal, and an understanding of how they function in distributed production environments
  • 05Working proficiency in Golang or Python — enough to review production code, contribute meaningfully to technical design discussions, and support your engineers' work
  • 06Demonstrated people management skills, including experience with structured performance management, individualized coaching, and building or improving onboarding and training programs
  • 07SLA/SLO ownership experience — you've set them, measured them, reported on them, and held teams accountable to them in a customer-facing environment
  • 08A track record of influencing cross-functional strategy and driving alignment across engineering leadership on operational priorities