Routing for Large ML Models
Cohen, Ofir, Schapira, Jose Yallouz Michael, Belkar, Shahar, Mizrahi, Tal
–arXiv.org Artificial Intelligence
The communication Our aim is to devise methodologies for the online adaptation patterns induced by these training process exhibit of routing configurations in ML training clusters that high regularity and persistence, giving rise to significant improve global training efficiency and fairness. Our approach opportunities for optimizing the manner in which flows are builds on two characteristics of ML training and modern networking: routed across the network. We present an algorithmic framework for quantifying network-wide efficiency in the context of training LLMs (and other large-scale ML models), and for periodically optimizing routing with respect to this global Traffic patterns induced by ML training tend to exhibit metric.
arXiv.org Artificial Intelligence
Mar-7-2025
- Country:
- Asia > Middle East
- Israel > Jerusalem District > Jerusalem (0.04)
- Europe > Italy
- Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > United States
- New York > New York County > New York City (0.04)
- Asia > Middle East
- Genre:
- Research Report (0.64)
- Industry:
- Information Technology (0.69)
- Telecommunications (0.48)
- Transportation (0.48)
- Technology: