Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning
Igor Colin, Ludovic DOS SANTOS, Kevin Scaman
–Neural Information Processing Systems
We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example. For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal.
Neural Information Processing Systems
Mar-27-2025, 03:57:49 GMT
- Technology: