Crusoe28.04.2026
Senior Manager, Engineering
Полная занятостьОфис
Обязанности
- 01Manage, coach, and grow a team of production engineers across shifts and time zones
- 02Run structured 1:1s focused on career development, deliver candid performance feedback, and build a team culture grounded in ownership and continuous improvement
- 03Partner with engineering leadership and recruiting to grow the team — owning the full hiring lifecycle from interview design to offer
- 04Build and continuously improve onboarding and training programs that ramp new engineers quickly and effectively
- 05Serve as an escalation point for high-severity incidents
- 06Lead postmortems with a focus on systemic fixes, ensure action items are tracked and completed, and drive down MTTR over time
- 07Define, monitor, and report on SLIs, SLOs, and SLAs across Crusoe's production systems
- 08Oversee the design and maintenance of alerting and observability systems across bare-metal and cloud infrastructure, ensuring the team has the signal it needs to detect and respond to issues fast
- 09Identify and prioritize opportunities to automate repetitive operational work, improving team efficiency and system resilience over time
- 10Collaborate with infrastructure, platform engineering, product, and customer success teams to align on technical escalations, customer impact, and engineering priorities
- 11Own the team's day-to-day operational rhythm — stand-ups, on-call rotations, incident reviews, and sprint planning — ensuring the team runs smoothly across time zones
Требования
- 016+ years of experience managing 24/7 technical operations or SRE teams in cloud or data center environments, including demonstrated success developing senior engineers, building organizational capability, and improving operational outcomes at scale
- 02Strong Linux and infrastructure fundamentals, including hands-on experience with containerization, Kubernetes, and virtualization in production environments
- 03Observability and monitoring expertise, including experience with Prometheus, VictoriaMetrics, and custom exporters — ideally against bare-metal endpoints
- 04Familiarity with messaging and workflow systems such as RabbitMQ, Kafka, NATS, or Temporal, and an understanding of how they function in distributed production environments
- 05Working proficiency in Golang or Python — enough to review production code, contribute meaningfully to technical design discussions, and support your engineers' work
- 06Demonstrated people management skills, including experience with structured performance management, individualized coaching, and building or improving onboarding and training programs
- 07SLA/SLO ownership experience — you've set them, measured them, reported on them, and held teams accountable to them in a customer-facing environment
- 08A track record of influencing cross-functional strategy and driving alignment across engineering leadership on operational priorities