Reviews: Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

Oct-7-2024, 05:57:27 GMT–Neural Information Processing Systems

If so, I am confused why this is highlighted as a virtue of adding noise, since the purely deterministic dynamics of GD also evince this behavior. Numerical experiments: These are slightly hard to interpret. First, which plots show SGD dynamics, and which are for GD? Second, I'm puzzled by how to interpret the dotted lines in each plot. In the case of RBF, how are we to make sense of the empirical n {-2} decay? Is this somehow predicted in the analysis of the GD, or is it an empirical phenomenon which is not theoretically addressed in this work.

convergence and asymptotic error scaling, long time convergence, neural network, (9 more...)

Neural Information Processing Systems

Oct-7-2024, 05:57:27 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)