Crusoe15.05.2026

Senior Staff Data Center Operations Engineer, GPU Hardware Architecture

Полная занятостьОфис

Обязанности

  • 01Provide deep-dive technical guidance to the Data Center Engineering team on upcoming silicon
  • 02Ensure future facility designs for power, cooling, and rack-spacing are ready for 2000W+ per-chip densities
  • 03Leverage AI/ML methodologies to analyze fleet-wide telemetry
  • 04Lead the transition from reactive troubleshooting to predictive maintenance
  • 05Architect the site-level sparing strategy from a technical perspective
  • 06Use failure telemetry and MTBF data to define the Critical Spares List
  • 07Create precision SOPs for high-stakes GPU repairs
  • 08Develop diagnostic tooling for Site Ops to identify hardware issues
  • 09Act as the Tier-3 escalation point for complex hardware failures
  • 10Lead Root Cause Analysis on systemic issues
  • 11Maintain a 24-month forward-looking view of NVIDIA and AMD architectures
  • 12Support the technical relationship with OEMs and VARs
  • 13Audit hardware builds and review technical bulletins

Требования

  • 0110+ years in Hardware Engineering, Systems Architecture, or Data Center Infrastructure
  • 02Proven track record of educating and influencing cross-functional teams
  • 03Managed or architected GPU clusters at scale (thousands of nodes) at a hyperscaler, GPU-specialized cloud, or major silicon vendor