Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems 

This paper is an essentially theoretical contribution regarding convergence rates for the so-called "Hogwild"-style algorithms for stochastic gradient descent. In these algorithms, the gradient step is produces asynchronously over different chunks of the dataset in parallel, with results updating current weights as they are completed, independent of other parallel updates. Previously, demonstrating theoretical convergence has been difficult and somewhat brittle. They show that one of their proven variants, "Buckwild" provides significant real-world speedups by using lower precision arithmetic to compute the gradient steps. As far as the paper goes, it is generally good. I had little trouble reading and understanding the paper (I think), and they make a point to explain the maths in an intuitive fashion, insofar as it is possible.