Coordinating Distributed Example Orders for Provably Accelerated Training

Cooper, A. Feder, Guo, Wentao, Pham, Khiem, Yuan, Tiancheng, Ruan, Charlie F., Lu, Yucheng, De Sa, Christopher

Dec-21-2023–arXiv.org Artificial Intelligence

Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: while it demonstrates an impressive ability to scale-up training on centralized data, it does not naturally extend to modern distributed ML workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms distributed RR on a variety of benchmark tasks.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Dec-21-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > New York (0.14)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.69)
  - Statistical Learning (0.68)