Well File:

 Ludovic DOS SANTOS


Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Neural Information Processing Systems

We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example. For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal.