Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Igor Colin, Ludovic DOS SANTOS, Kevin Scaman

Mar-27-2025, 03:57:49 GMT–Neural Information Processing Systems

We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example. For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Mar-27-2025, 03:57:49 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)