AITopics | Gradient Descent

659e07806dc17bd69d0d9aed47f85e7c-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 21:26:44 GMT

artificial intelligence, machine learning, regularization, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
Africa > Kenya (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Yuanzhi Li, Yingyu Liang

Neural Information Processing SystemsFeb-12-2026, 20:46:45 GMT

Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.

artificial intelligence, initialization, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm

Neural Information Processing SystemsFeb-12-2026, 20:36:48 GMT

However, it is insufficient to evaluate the stochastic algorithm not to consider the generalization performance, which is roughly the gap between Eq.

artificial intelligence, generalization gap, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hubei Province > Wuhan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Add feedback

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Colin Wei, Jason D. Lee, Qiang Liu, Tengyu Ma

Neural Information Processing SystemsFeb-12-2026, 20:23:28 GMT

Recent works have shown that on sufficiently over-parametrized neural nets, gradient descent with relatively large initialization optimizes a prediction function in the RKHS of the Neural Tangent Kernel (NTK).

artificial intelligence, arxiv preprint arxiv, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.38)

Add feedback

On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

Aaron Defazio, Leon Bottou

Neural Information Processing SystemsFeb-12-2026, 19:32:42 GMT

SVR methods use control variates to reduce the variance of the traditional stochastic gradient descent (SGD) estimate f0i(w) of the full gradient f0(w). Control variates are a classical technique for reducing the variance of a stochastic quantity without introducing bias. Say we have some random variable X.

artificial intelligence, machine learning, variance reduction, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

The Physical Systems Behind Optimization Algorithms

Lin Yang, Raman Arora, Vladimir braverman, Tuo Zhao

Neural Information Processing SystemsFeb-12-2026, 19:30:54 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, convergence, convergence rate, (13 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Russia (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)

Add feedback

Samplingin Constrained Domainswith Orthogonal-Space Variational Gradient Descent

Neural Information Processing SystemsFeb-12-2026, 19:16:20 GMT

artificial intelligence, machine learning, neural information processing system, (12 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.05)
North America > United States > Texas (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Revisit last-iterate convergence of mSGD under milder requirement on step size

Neural Information Processing SystemsFeb-12-2026, 17:02:50 GMT

Understanding convergence of stochastic gradient descent (SGD) based optimization algorithms can help deal with enormous machine learning problems.

artificial intelligence, convergence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > China > Liaoning Province > Shenyang (0.04)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
(5 more...)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

7a2b33c672ce223b2aa5789171ddde2f-Paper.pdf

Neural Information Processing SystemsFeb-12-2026, 16:31:15 GMT

algorithm, descent, gradient descent, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.45)

Add feedback

A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence Carlo Alfano Department of Statistics University of Oxford

Neural Information Processing SystemsFeb-12-2026, 16:30:57 GMT

In this work, we introduce a framework for policy optimization based on mirror descent that naturally accommodates general parameterizations. The policy class induced by our scheme recovers known classes, e.g., softmax, and generates new ones depending on the choice of mirror map.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.50)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Russia (0.04)
(3 more...)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

Filters

Collaborating Authors

Gradient Descent

659e07806dc17bd69d0d9aed47f85e7c-Paper-Conference.pdf

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

Stability and Generalization of the Decentralized Stochastic Gradient Descent Ascent Algorithm

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

On the Ineffectiveness of Variance Reduced Optimization for Deep Learning

The Physical Systems Behind Optimization Algorithms

Samplingin Constrained Domainswith Orthogonal-Space Variational Gradient Descent

Revisit last-iterate convergence of mSGD under milder requirement on step size

7a2b33c672ce223b2aa5789171ddde2f-Paper.pdf

A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence Carlo Alfano Department of Statistics University of Oxford