AITopics | overparameterized learning

Collaborating Authors

overparameterized learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Characterizing Dynamical Stability of Stochastic Gradient Descent in Overparameterized Learning

Chemnitz, Dennis, Engel, Maximilian

arXiv.org Artificial IntelligenceSep-18-2024

For overparameterized optimization tasks, such as the ones found in modern machine learning, global minima are generally not unique. In order to understand generalization in these settings, it is vital to study to which minimum an optimization algorithm converges. The possibility of having minima that are unstable under the dynamics imposed by the optimization algorithm limits the potential minima that the algorithm can find. In this paper, we characterize the global minima that are dynamically stable/unstable for both deterministic and stochastic gradient descent (SGD). In particular, we introduce a characteristic Lyapunov exponent which depends on the local dynamics around a global minimum and rigorously prove that the sign of this Lyapunov exponent determines whether SGD can accumulate at the respective global minimum.

characterizing dynamical stability, overparameterized learning, stochastic gradient descent

arXiv.org Artificial Intelligence

2407.20209

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

A Framework for Overparameterized Learning

Terjék, Dávid, González-Sánchez, Diego

arXiv.org Artificial IntelligenceFeb-13-2023

A candidate explanation of the good empirical performance of deep neural networks is the implicit regularization effect of first order optimization methods. Inspired by this, we prove a convergence theorem for nonconvex composite optimization, and apply it to a general learning problem covering many machine learning applications, including supervised learning. We then present a deep multilayer perceptron model and prove that, when sufficiently wide, it $(i)$ leads to the convergence of gradient descent to a global optimum with a linear rate, $(ii)$ benefits from the implicit regularization effect of gradient descent, $(iii)$ is subject to novel bounds on the generalization error, $(iv)$ exhibits the lazy training phenomenon and $(v)$ enjoys learning rate transfer across different widths. The corresponding coefficients, such as the convergence rate, improve as width is further increased, and depend on the even order moments of the data generating distribution up to an order depending on the number of layers. The only non-mild assumption we make is the concentration of the smallest eigenvalue of the neural tangent kernel at initialization away from zero, which has been shown to hold for a number of less general models in contemporary works. We present empirical evidence supporting this assumption as well as our theoretical claims.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2205.13507

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback