Neural Networks Learning and Memorization with (almost) no Over-Parameterization

Oct-10-2024, 10:05:22 GMT–Neural Information Processing Systems

Many results in recent years established polynomial time learnability of various models via neural networks algorithms (e.g. However, unless the model is linear separable \cite{brutzkus2018sgd}, or the activation is a polynomial \cite{ge2019mildly}, these results require very large networks -- much more than what is needed for the mere existence of a good predictor. In this paper we prove that SGD on depth two neural networks can memorize samples, learn polynomials with bounded weights, and learn certain kernel spaces, with {\em near optimal} network size, sample complexity, and runtime. In particular, we show that SGD on depth two network with \tilde{O}\left(\frac{m}{d}\right) hidden neurons (and hence \tilde{O}(m) parameters) can memorize m random labeled points in \sphere {d-1} .

artificial intelligence, machine learning, neural network learning and memorization, (3 more...)

Neural Information Processing Systems

Oct-10-2024, 10:05:22 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Memory-Based Learning > Rote Learning (0.40)
  - Neural Networks (0.93)