Baseten10.03.2026
Infrastructure Ops Engineer
Полная занятостьУдалёнка
Обязанности
- 01Manage daily node operations including tainting/untainting, node draining, and PVC repairs to ensure GPU fleet health and operational cost control
- 02Partner with Sales and account teams to scope and fulfill customer capacity requests, translating complex timelines into concrete infrastructure actions and clear ETAs
- 03Identify recurring gaps in the capacity lifecycle (intake, triage, comms) and drive fixes by defining lightweight processes and improving system observability
- 04Act as the operational bridge between SRE and Infra teams, executing discrete changes and verifying system status during high-stakes maintenance windows
- 05Contribute to the internal knowledge base for GPU-specific issues (H100/A100/B200) to accelerate future incident resolution
- 06Identify repetitive workflows and partner with engineering to build scripts, dashboards, and internal tools that reduce manual intervention and shorten time-to-mitigation
- 07Maintain a living database of GPU-specific intelligence (H100/B200) and market moves to accelerate incident resolution and support strategic briefings for leadership
Требования
- 01Bachelor's or Master's degree in Computer Science, Engineering, or a related field
- 022+ years of professional work experience, ideally in a customer-facing technical role or as a junior SRE/Cloud Engineer
- 03Strong familiarity with Kubernetes and the lifecycle of cloud-based container orchestration
- 04Strong ownership mindset and attention to detail, demonstrated through fast detection, clear communication, and reliable follow-through
- 05Demonstrated ability to communicate complex technical blockers clearly to both internal engineering teams and external vendors
Условия
- 01Competitive compensation, including meaningful equity
- 02100% coverage of medical, dental, and vision insurance for employee and dependents
- 03Flexible PTO policy including company wide Winter Break (offices closed from Christmas Eve to New Year's Day)
- 04Paid parental leave
- 05Fertility and family-building stipend through Carrot
- 06Company-facilitated 401(k)
- 07Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities
- 08Preference for SF or NYC-based candidates