Reviews: The Effect of Network Width on the Performance of Large-batch Training

Neural Information Processing Systems 

It has been wide discussed on how to develop algorithms allow large batches, so that one could train neural networks in a distributed environment. The paper investigates the effect of network width on the performance of large-batch training both theoretically and experimentally. The authors claim that with the same number of parameters, it is more likely to train neural networks using proper large batches easily with a wide network architecture. The theoretical support on 2-layers linear/nonlinear networks and multilayer linear networks is also given. The paper is well-written and easy to follow.