Review for NeurIPS paper: Quantitative Propagation of Chaos for SGD in Wide Neural Networks

Jan-21-2025, 03:40:50 GMT–Neural Information Processing Systems

Additional Feedback: I have a few minor comments. Specifically: (1a) Depending on how one thinks about it, the learning rate in previous papers on infinite width SGD depends on the number of hidden units. What I mean is that you explicitly put in the 1/N as the size of the weights into the last layer. This has the effect of putting in a 1/N in the derivative d Loss / d W, where W is a weight in the first layer, which is akin to putting an extra 1/N into the learning rate. In previous papers (NTK-type analyses in deeper networks), sometimes this scale of weights is like N {-1/2}.

neurips paper, quantitative propagation, wide neural network, (2 more...)

Neural Information Processing Systems

Jan-21-2025, 03:40:50 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)