Baseten24.02.2026

Software Engineer — GPU Networking & Distributed Systems

Полная занятостьУдалёнка

Обязанности

  • 01Integrate RDMA/RoCE/InfiniBand capabilities into the inference stack
  • 02Implement and tune networking layers for Disaggregated KV Cache Offload and WideEP
  • 03Work on checkpointing and storage mechanisms to achieve sub-10-second startup for trillion-parameter models
  • 04Characterize and validate networking performance on H100/H200, B200/B300, NVL72/GB300 clusters and write acceptance tests
  • 05Design observability tools for visualizing packet flow, congestion, and bandwidth across GPU interconnects
  • 06Optimize communication kernels using NCCL, NVSHMEM and potentially write custom kernels

Требования

  • 01Deep experience with high-performance networking protocols such as InfiniBand and RoCE v2
  • 02Proficiency in C++ or Python and ability to work close to hardware
  • 03Strong understanding of memory hierarchy in modern NVIDIA architectures (H100/Blackwell)
  • 04Ability to debug NVLink topology issues and work with TensorRT-LLM source code
  • 05Capability to evaluate off‑the‑shelf versus custom networking solutions

Условия

  • 01Competitive compensation with equity
  • 02100% coverage of medical, dental, and vision insurance for employee and dependents
  • 03Flexible PTO policy and company‑wide Winter Break
  • 04Paid parental leave and fertility/family‑building stipend
  • 05Company‑facilitated 401(k) plan