Reviews: Fast Convergence of Natural Gradient Descent for Over-Parameterized Neural Networks

Jan-22-2025, 03:57:16 GMT–Neural Information Processing Systems

After rebuttal: I have carefully read the comments from other reviewers and the feedback from the authors. My main concern was the generalization ability of NGD, but the experiments in the feedback are a bit confused to me because GD doesn't seem to achieve zero training loss but NGD converges to 0 very quickly in MNIST regression. I would suggest the authors provide more details about that experiment setting, e.g., how do you select the hyperparameter. Thus, I would like to keep my score unchanged. The framework for the proof follows the recent line of work about over-parametrization, e.g., the papers written by Du et al, Li and Liang, and Allen-Zhu et al., the core of which is the Gram matrix.

artificial intelligence, machine learning, natural gradient descent, (11 more...)

Neural Information Processing Systems

Jan-22-2025, 03:57:16 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.43)
  - Statistical Learning > Gradient Descent (0.42)