Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems 

This paper studies general-purpose training algorithms for deep learning and proposes a family of algorithms called elastic averaging SGD. The idea is novel and the paper is of very high quality. The paper focuses on training large-scale deep learning models under communication constraints. This problem is difficult since there are many local optima in non-convex problems like in deep learning. The optimization problem is formulated as a global variable consensus problem such that local workers would not fall into different local optima, and then its gradient update rules are reinterpreted using the elastic forces between local and global parameters.