Reviews: Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Neural Information Processing Systems 

If so, I am confused why this is highlighted as a virtue of adding noise, since the purely deterministic dynamics of GD also evince this behavior. Numerical experiments: These are slightly hard to interpret. First, which plots show SGD dynamics, and which are for GD? Second, I'm puzzled by how to interpret the dotted lines in each plot. In the case of RBF, how are we to make sense of the empirical n {-2} decay? Is this somehow predicted in the analysis of the GD, or is it an empirical phenomenon which is not theoretically addressed in this work.