AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

A Standard Maximum Likelihood Estimation and Links to I

Neural Information Processing SystemsAug-15-2025, 08:15:31 GMT

In the standard MLE setting [see, e.g., Murphy, 2012, Ch. 9] we are interested in learning the These two definitions are, however, essentially equivalent. Eq. (15) is a smooth objective that can be optimized with a (stochastic) gradient descent procedure. This section contains the proofs of the results relative to the perturb and map section (Section 3.2) and The proposition now follows from arguments made in Papandreou and Y uille [2011] Its moment generating function has the form E[exp(tX)] = Γ(1 τt). As mentioned in Johnson and Balakrishnan [p. Parts of the proof are inspired by a post on stackexchange Xi'an [2016].Theorem 1.

dense layer, estimator, experiment, (12 more...)

Neural Information Processing Systems

Country:

Asia > China > Shaanxi Province > Xi'an (0.24)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias

Neural Information Processing SystemsAug-15-2025, 00:48:17 GMT

On the pessimistic side, the paper suggests that such results are fragile.

artificial intelligence, classifier, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback

8493eeaccb772c0878f99d60a0bd2bb3-Supplemental.pdf

Neural Information Processing SystemsAug-14-2025, 22:59:00 GMT

neural network, noisy label, subset, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Add feedback

Coresets for Robust Training of Neural Networks against Noisy Labels

Neural Information Processing SystemsAug-14-2025, 22:58:53 GMT

There has been a great empirical progress in robust training of neural networks against noisy labels.

neural network, noisy label, subset, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Add feedback

531230cfac80c65017ad0f85d3031edc-Paper-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 21:11:48 GMT

artificial intelligence, machine learning, optimization, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.28)
Europe > United Kingdom > England (0.14)
Oceania > Australia (0.14)
(6 more...)

Genre: Research Report (0.46)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Impression learning Online representation learning with synaptic plasticity Appendices

Neural Information Processing SystemsAug-14-2025, 19:43:01 GMT

Our derivation of the update for IL (Eq. 3) is based on an expansion of log We examine the consequences of this bias formula for our specific model. Note that the update term in Eq. (S1) is However, we will show in Appendix C that these updates may have high variance. 'reparameterization trick,' in which a change of variables allows the use of stochastic gradient descent It is worth noting that this'reparameterization' will work only for additive Gaussian noise. As already mentioned, WS can be viewed as a special case of IL. Since WS is a special case of IL, the bias properties of its individual samples are identical.

artificial intelligence, machine learning, variance, (18 more...)

Neural Information Processing Systems

Genre: Instructional Material > Online (0.40)

Industry:

Energy > Oil & Gas (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Generalization Bounds for Gradient Methods via Discrete and Continuous Prior

Neural Information Processing SystemsAug-14-2025, 12:29:58 GMT

Proving algorithm-dependent generalization error bounds for gradient-type optimization methods has attracted significant attention recently in learning theory. However, most existing trajectory-based analyses require either restrictive assumptions on the learning rate (e.g., fast decreasing learning rate), or continuous injected

generalization, neural information processing system, noise, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)
(2 more...)

Add feedback

Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability

Neural Information Processing SystemsAug-14-2025, 09:57:57 GMT

Recent findings demonstrate that modern neural networks trained by full-batch gradient descent typically enter a regime called Edge of Stability (EOS).

assumption 3, neural network, sharpness, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Global Convergence Analysis of Vanilla Gradient Descent for Asymmetric Matrix Completion

Zhang, Xu, Chen, Shuo, Li, Jinsheng, Pang, Xiangying, Gong, Maoguo

arXiv.org Artificial IntelligenceAug-14-2025

This paper investigates the asymmetric low-rank matrix completion problem, which can be formulated as an unconstrained non-convex optimization problem with a nonlinear least-squares objective function, and is solved via gradient descent methods. Previous gradient descent approaches typically incorporate regularization terms into the objective function to guarantee convergence. However, numerical experiments and theoretical analysis of the gradient flow both demonstrate that the elimination of regularization terms in gradient descent algorithms does not adversely affect convergence performance. By introducing the leave-one-out technique, we inductively prove that the vanilla gradient descent with spectral initialization achieves a linear convergence rate with high probability. Besides, we demonstrate that the balancing regularization term exhibits a small norm during iterations, which reveals the implicit regularization property of gradient descent. Empirical results show that our algorithm has a lower computational cost while maintaining comparable completion performance compared to other gradient descent algorithms.

artificial intelligence, hypothesis 1, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2508.09685

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Hong Kong (0.04)
North America > United States (0.04)
(4 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Domain-Generalization to Improve Learning in Meta-Learning Algorithms

Anjum, Usman, Stockman, Chris, Luong, Cat, Zhan, Justin

arXiv.org Artificial IntelligenceAug-14-2025

This paper introduces Domain Generalization Sharpness-Aware Minimization Model-Agnostic Meta-Learning (DGS-MAML), a novel meta-learning algorithm designed to generalize across tasks with limited training data. DGS-MAML combines gradient matching with sharpness-aware minimization in a bi-level optimization framework to enhance model adaptability and robustness. We support our method with theoretical analysis using PAC-Bayes and convergence guarantees. Experimental results on benchmark datasets show that DGS-MAML outperforms existing approaches in terms of accuracy and generalization. The proposed method is particularly useful for scenarios requiring few-shot learning and quick adaptation, and the source code is publicly available at GitHub.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2508.09418

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback