Reviews: Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Neural Information Processing Systems 

The relationship between the proposed pipeline parallel optimization setting and existing work is not clear. Does it contain related work as special cases? The authors mentioned in the abstract that the presented study is distributed per-layer instead of per-sample. It could be helpful to give additional comparison along this line. This was briefly touched in Section 2 on asynchronous value/gradient evaluation.