AITopics | neural network learn function

Collaborating Authors

neural network learn function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SGD on Neural Networks Learns Functions of Increasing Complexity

Neural Information Processing SystemsDec-25-2025, 21:42:11 GMT

We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks. We show that in the initial epochs, almost all of the performance improvement of the classifier obtained by SGD can be explained by a linear classifier. More generally, we give evidence for the hypothesis that, as iterations progress, SGD learns functions of increasing complexity. This hypothesis can be helpful in explaining why SGD-learned classifiers tend to generalize well even in the over-parameterized regime. We also show that the linear classifier learned in the initial stages is ``retained'' throughout the execution even if training is continued to the point of zero training error, and complement this with a theoretical result in a simplified model. Key to our work is a new measure of how well one classifier explains the performance of another, based on conditional mutual information.

classifier, name change, neural network learn function, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Reviews: SGD on Neural Networks Learns Functions of Increasing Complexity

Neural Information Processing SystemsFeb-11-2025, 23:04:59 GMT

There is a lot of support for the paper in the reviews. While much "folklore knowledge" exists around implicit regularization of SGD (e.g. Some suggestions of improvement should be taken seriously, but all in all the paper makes a valuable contribution towards understanding the interplay of optimization and representational power (types of functions).

complexity, neural network learn function, sgd

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Reviews: SGD on Neural Networks Learns Functions of Increasing Complexity

Neural Information Processing SystemsJan-26-2025, 14:32:49 GMT

Update: I have read the author's response and am keeping my score Originality: The idea that SGD biases networks towards simple classifiers has been discussed in the community extensively, but lacked a crisp formalization. This paper proposes a formal description for this conjecture (and also extends it -- Conjecture 1 and Claim 2) using a rich and simple information-theoretic framework. This work is roughly related to recent work on understanding the inductive bias of SGD in function space, e.g. Valle-Perez et al, Savarese et al: both only analyze the solution returned by SGD (under different assumptions), and not the intermediate iterates like this paper does. By defining the'simplicity' of a neural network solely on the mutual information between its predictions and another classifier leads to objects which are invariant to reparameterization of the network, and only depend on the function that it implements.

function space, neural network learn function, sgd, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

SGD on Neural Networks Learns Functions of Increasing Complexity

Neural Information Processing SystemsOct-10-2024, 19:06:36 GMT

We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks. We show that in the initial epochs, almost all of the performance improvement of the classifier obtained by SGD can be explained by a linear classifier. More generally, we give evidence for the hypothesis that, as iterations progress, SGD learns functions of increasing complexity. This hypothesis can be helpful in explaining why SGD-learned classifiers tend to generalize well even in the over-parameterized regime. We also show that the linear classifier learned in the initial stages is retained'' throughout the execution even if training is continued to the point of zero training error, and complement this with a theoretical result in a simplified model.

classifier, complexity, neural network learn function, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)

Add feedback

SGD on Neural Networks Learns Functions of Increasing Complexity

Kalimeris, Dimitris, Kaplun, Gal, Nakkiran, Preetum, Edelman, Benjamin, Yang, Tristan, Barak, Boaz, Zhang, Haofeng

Neural Information Processing SystemsMar-18-2020, 21:48:27 GMT

We perform an experimental study of the dynamics of Stochastic Gradient Descent (SGD) in learning deep neural networks for several real and synthetic classification tasks. We show that in the initial epochs, almost all of the performance improvement of the classifier obtained by SGD can be explained by a linear classifier. More generally, we give evidence for the hypothesis that, as iterations progress, SGD learns functions of increasing complexity. This hypothesis can be helpful in explaining why SGD-learned classifiers tend to generalize well even in the over-parameterized regime. We also show that the linear classifier learned in the initial stages is retained'' throughout the execution even if training is continued to the point of zero training error, and complement this with a theoretical result in a simplified model.

classifier, complexity, neural network learn function, (3 more...)

Neural Information Processing Systems

Genre: Research Report (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)

Add feedback