Baseten24.02.2026
Software Engineer — GPU Networking & Distributed Systems
Полная занятостьУдалёнка
Обязанности
- 01Integrate RDMA/RoCE/InfiniBand capabilities into the inference stack
- 02Implement and tune networking layers for Disaggregated KV Cache Offload and WideEP
- 03Work on checkpointing and storage mechanisms to achieve sub-10-second startup for trillion-parameter models
- 04Characterize and validate networking performance on H100/H200, B200/B300, NVL72/GB300 clusters and write acceptance tests
- 05Design observability tools for visualizing packet flow, congestion, and bandwidth across GPU interconnects
- 06Optimize communication kernels using NCCL, NVSHMEM and potentially write custom kernels
Требования
- 01Deep experience with high-performance networking protocols such as InfiniBand and RoCE v2
- 02Proficiency in C++ or Python and ability to work close to hardware
- 03Strong understanding of memory hierarchy in modern NVIDIA architectures (H100/Blackwell)
- 04Ability to debug NVLink topology issues and work with TensorRT-LLM source code
- 05Capability to evaluate off‑the‑shelf versus custom networking solutions
Условия
- 01Competitive compensation with equity
- 02100% coverage of medical, dental, and vision insurance for employee and dependents
- 03Flexible PTO policy and company‑wide Winter Break
- 04Paid parental leave and fertility/family‑building stipend
- 05Company‑facilitated 401(k) plan