Crusoe15.05.2026
Senior Staff Data Center Operations Engineer, GPU Hardware Architecture
Полная занятостьОфис
Обязанности
- 01Provide deep-dive technical guidance to the Data Center Engineering team on upcoming silicon
- 02Ensure future facility designs for power, cooling, and rack-spacing are ready for 2000W+ per-chip densities
- 03Leverage AI/ML methodologies to analyze fleet-wide telemetry
- 04Lead the transition from reactive troubleshooting to predictive maintenance
- 05Architect the site-level sparing strategy from a technical perspective
- 06Use failure telemetry and MTBF data to define the Critical Spares List
- 07Create precision SOPs for high-stakes GPU repairs
- 08Develop diagnostic tooling for Site Ops to identify hardware issues
- 09Act as the Tier-3 escalation point for complex hardware failures
- 10Lead Root Cause Analysis on systemic issues
- 11Maintain a 24-month forward-looking view of NVIDIA and AMD architectures
- 12Support the technical relationship with OEMs and VARs
- 13Audit hardware builds and review technical bulletins
Требования
- 0110+ years in Hardware Engineering, Systems Architecture, or Data Center Infrastructure
- 02Proven track record of educating and influencing cross-functional teams
- 03Managed or architected GPU clusters at scale (thousands of nodes) at a hyperscaler, GPU-specialized cloud, or major silicon vendor