Towards Easy and Realistic Network Infrastructure Testing for Large-scale Machine Learning
Yoo, Jinsun, Lao, ChonLam, Cao, Lianjie, Lantz, Bob, Yu, Minlan, Krishna, Tushar, Sharma, Puneet
–arXiv.org Artificial Intelligence
This paper lays the foundation for Genie, a testing framework that captures the impact of real hardware network behavior on ML workload performance, without requiring expensive GPUs. Genie uses CPU-initiated traffic over a hardware testbed to emulate GPU to GPU communication, and adapts the ASTRA-sim simulator to model interaction between the network and the ML workload.
arXiv.org Artificial Intelligence
Apr-30-2025
- Country:
- North America > United States (0.48)
- Genre:
- Research Report (0.40)
- Industry:
- Information Technology (0.69)
- Telecommunications > Networks (0.68)
- Technology: