AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

Neural Information Processing SystemsFeb-18-2026, 02:00:12 GMT

IMP-MoE-L focusing on video tasks that achieves new state-of-the-art in zero-shot video classification: 77.0% on Kinetics-400, 76.8% on Kinetics-600, and 68.3% on

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)

Genre: Research Report (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.41)

Add feedback

Contextual Stochastic Bilevel Optimization

Neural Information Processing SystemsFeb-18-2026, 00:41:49 GMT

Multilevel Monte-Carlo (MLMC) technique and establish its sample and computational complexities.

artificial intelligence, estimator, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States (0.14)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

On the Convergence to a Global Solution of Shuffling-Type Gradient Algorithms Lam M. Nguyen

Neural Information Processing SystemsFeb-17-2026, 20:41:06 GMT

Stochastic gradient descent (SGD) algorithm is the method of choice in many machine learning tasks thanks to its scalability and efficiency in dealing with large-scale problems. In this paper, we focus on the shuffling version of SGD which matches the mainstream practical heuristics. We show the convergence to a global solution of shuffling SGD for a class of non-convex functions under over-parameterized settings.

artificial intelligence, convergence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(7 more...)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Adaptive Proximal Gradient Method for Convex Optimization

Neural Information Processing SystemsFeb-17-2026, 16:09:50 GMT

In this paper, we explore two fundamental first-order algorithms in convex optimization, namely, gradient descent (GD) and proximal gradient method (ProxGD). Our focus is on making these algorithms entirely adaptive by leveraging local curvature information of smooth functions.

artificial intelligence, machine learning, stepsize, (17 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Resetting the Optimizer in Deep RL: An Empirical Study

Neural Information Processing SystemsFeb-17-2026, 16:08:22 GMT

We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of solving a sequence of optimization problems where the loss function changes per iteration. The common approach to solving this sequence of problems is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first-order and the second-order moments of the gradient, and update them over time. Therefore, information obtained in previous iterations is used to solve the optimization problem in the current iteration. We demonstrate that this can contaminate the moment estimates because the optimization landscape can change arbitrarily from one iteration to the next one. To hedge against this negative effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting idea by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification significantly improves the performance of deep RL on the Atari benchmark.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory

Neural Information Processing SystemsFeb-17-2026, 14:56:02 GMT

Cohen et al. (2021) empirically study the evolution of the largest eigenvalue of the loss Hessian, also known as sharpness, along the gradient descent (GD) trajectory and observe the Edge of Stability (EoS) phenomenon.

artificial intelligence, machine learning, trajectory, (18 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Reading (0.04)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Tight Risk Bounds for Gradient Descent on Separable Data

Neural Information Processing SystemsFeb-17-2026, 10:08:57 GMT

Recently, there has been a marked increase in interest regarding the generalization capabilities of unregularized gradient-based learning methods.

artificial intelligence, gradient descent, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Asia > China (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Add feedback

Tight Risk Bounds for Gradient Descent on Separable Data

Neural Information Processing SystemsFeb-17-2026, 10:08:54 GMT

Recently, there has been a marked increase in interest regarding the generalization capabilities of unregularized gradient-based learning methods.

artificial intelligence, gradient descent, machine learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Asia > China (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

acccb791886d5d811fe4e16e98c26633-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 09:24:39 GMT

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > Austria > Vienna (0.14)
Europe > Spain > Andalusia > Seville Province > Seville (0.04)
Europe > Spain > Galicia > Madrid (0.04)
(9 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Infrastructure & Services (0.46)
Transportation > Air (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Communications > Networks (0.68)
Information Technology > Data Science (0.67)
(2 more...)

Add feedback

Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent Liu Ziyin Massachusetts Institute of Technology, NTT Research

Neural Information Processing SystemsFeb-17-2026, 08:10:19 GMT

Symmetries are prevalent in deep learning and can significantly influence the learning dynamics of neural networks. In this paper, we examine how exponential symmetries - a broad subclass of continuous symmetries present in the model architecture or loss function - interplay with stochastic gradient descent (SGD). We first prove that gradient noise creates a systematic motion (a "Noether flow") of the parameters θ along the degenerate direction to a unique initialization-independent fixed point θ

artificial intelligence, machine learning, symmetry, (15 more...)

Neural Information Processing Systems

Country: