AITopics | gradient descent converge linearly

Collaborating Authors

gradient descent converge linearly

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks

Beneventano, Pierfrancesco, Woodworth, Blake

arXiv.org Machine LearningJan-15-2025

We study the gradient descent (GD) dynamics of a depth-2 linear neural network with a single input and output. We show that GD converges at an explicit linear rate to a global minimum of the training loss, even with a large stepsize -- about $2/\textrm{sharpness}$. It still converges for even larger stepsizes, but may do so very slowly. We also characterize the solution to which GD converges, which has lower norm and sharpness than the gradient flow solution. Our analysis reveals a trade off between the speed of convergence and the magnitude of implicit regularization. This sheds light on the benefits of training at the ``Edge of Stability'', which induces additional regularization by delaying convergence and may have implications for training more complex models.

convergence, flatter minima, gradient descent converge linearly, (9 more...)

arXiv.org Machine Learning

2501.09137

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > District of Columbia > Washington (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Add feedback

Gradient Descent Converges Linearly for Logistic Regression on Separable Data

Axiotis, Kyriakos, Sviridenko, Maxim

arXiv.org Artificial IntelligenceJun-25-2023

We show that running gradient descent with variable learning rate guarantees loss $f(x) \leq 1.1 \cdot f(x^*) + \epsilon$ for the logistic regression objective, where the error $\epsilon$ decays exponentially with the number of iterations and polynomially with the magnitude of the entries of an arbitrary fixed solution $x^*$. This is in contrast to the common intuition that the absence of strong convexity precludes linear convergence of first-order methods, and highlights the importance of variable learning rates for gradient descent. We also apply our ideas to sparse logistic regression, where they lead to an exponential improvement of the sparsity-error tradeoff.

artificial intelligence, gradient descent converge linearly, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2306.14381

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.93)

Add feedback

Local Geometry of One-Hidden-Layer Neural Networks for Logistic Regression

Fu, Haoyu, Chi, Yuejie, Liang, Yingbin

arXiv.org Machine LearningFeb-18-2018

We study the local geometry of a one-hidden-layer fully-connected neural network where the training samples are generated from a multi-neuron logistic regression model. We prove that under Gaussian input, the empirical risk function employing quadratic loss exhibits strong convexity and smoothness uniformly in a local neighborhood of the ground truth, for a class of smooth activation functions satisfying certain properties, including sigmoid and tanh, as soon as the sample complexity is sufficiently large. This implies that if initialized in this neighborhood, gradient descent converges linearly to a critical point that is provably close to the ground truth without requiring a fresh set of samples at each iteration. This significantly improves upon prior results on learning shallow neural networks with multiple neurons. To the best of our knowledge, this is the first global convergence guarantee for one-hidden-layer neural networks using gradient descent over the empirical risk function without resampling at the near-optimal sampling and computational complexity.

activation function, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1802.06463

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (0.70)
Research Report > Experimental Study (0.60)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Add feedback