Reviews: Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Jan-20-2025, 04:23:23 GMT–Neural Information Processing Systems

This paper studies learning over-parametrized single hidden layer ReLU neural networks for multi-class classification via SGD and the corresponding generalization error. They consider a mixture data distribution where each class has well-separated and compact support. The authors show SGD applied on the considered learning model achieves good prediction error with high probability under suitable assumptions. As a result even in severely over-parametrized models, SGD can generalize well although the network has enough capacity to fit arbitrary labels. The main insight in the theoretical analysis appears to be the observation that in the over-parametrized case, many ReLU neurons don't change their activation pattern when initialized randomly.

learning overparameterized neural network, stochastic gradient descent, structured data, (4 more...)

Neural Information Processing Systems

Jan-20-2025, 04:23:23 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.59)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.95)
  - Statistical Learning > Gradient Descent (0.85)