Routing for Large ML Models
Cohen, Ofir, Schapira, Jose Yallouz Michael, Belkar, Shahar, Mizrahi, Tal
–arXiv.org Artificial Intelligence
The communication Our aim is to devise methodologies for the online adaptation patterns induced by these training process exhibit of routing configurations in ML training clusters that high regularity and persistence, giving rise to significant improve global training efficiency and fairness. Our approach opportunities for optimizing the manner in which flows are builds on two characteristics of ML training and modern networking: routed across the network. We present an algorithmic framework for quantifying network-wide efficiency in the context of training LLMs (and other large-scale ML models), and for periodically optimizing routing with respect to this global Traffic patterns induced by ML training tend to exhibit metric.
arXiv.org Artificial Intelligence
Mar-7-2025
- Country:
- North America > United States (0.14)
- Genre:
- Research Report (0.64)
- Industry:
- Information Technology (0.69)
- Telecommunications (0.48)
- Transportation (0.48)
- Technology: