Routing for Large ML Models

Cohen, Ofir, Schapira, Jose Yallouz Michael, Belkar, Shahar, Mizrahi, Tal

Mar-7-2025–arXiv.org Artificial Intelligence

The communication Our aim is to devise methodologies for the online adaptation patterns induced by these training process exhibit of routing configurations in ML training clusters that high regularity and persistence, giving rise to significant improve global training efficiency and fairness. Our approach opportunities for optimizing the manner in which flows are builds on two characteristics of ML training and modern networking: routed across the network. We present an algorithmic framework for quantifying network-wide efficiency in the context of training LLMs (and other large-scale ML models), and for periodically optimizing routing with respect to this global Traffic patterns induced by ML training tend to exhibit metric.

controller, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Mar-7-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report (0.64)

Industry:
- Information Technology (0.69)
- Telecommunications (0.48)
- Transportation (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (0.93)
  - Natural Language > Large Language Model (1.00)