AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

Yuhuai Wu, Elman Mansimov, Roger B. Grosse, Shun Liao, Jimmy Ba

Neural Information Processing SystemsNov-21-2025, 07:08:24 GMT

In this work, we propose to apply trust region optimization to deep reinforcement learning using a recently proposed Kronecker-factored approximation to the curvature.

approximation, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.15)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Industry: Leisure & Entertainment > Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Add feedback

Gradient Methods for Submodular Maximization

Hamed Hassani, Mahdi Soltanolkotabi, Amin Karbasi

Neural Information Processing SystemsNov-21-2025, 05:52:21 GMT

In particular, it allows us to approximately maximize discrete, monotone submodular optimization problems via projected gradient ascent on a continuous relaxation, directly connecting the discrete and continuous domains.

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Connecticut > New Haven County > New Haven (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Stein Variational Gradient Descent as Gradient Flow

Qiang Liu

Neural Information Processing SystemsNov-21-2025, 05:07:41 GMT

This paper develops the first theoretical analysis on SVGD.

artificial intelligence, machine learning, particle, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New Hampshire > Grafton County > Hanover (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.42)

Add feedback

Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization

Fabian Pedregosa, Rémi Leblond, Simon Lacoste-Julien

Neural Information Processing SystemsNov-21-2025, 04:09:02 GMT

Due to their simplicity and excellent performance, parallel asynchronous variants of stochastic gradient descent have become popular methods to solve a wide range of large-scale optimization problems on multi-core architectures. Y et, despite their practical success, support for nonsmooth objectives is still lacking, making them unsuitable for many problems of interest in machine learning, such as the Lasso, group Lasso or empirical risk minimization with convex constraints.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

Stochastic Gradient Richardson-Romberg Markov Chain Monte Carlo

Alain Durmus, Umut Simsekli, Eric Moulines, Roland Badeau, Gaël RICHARD

Neural Information Processing SystemsNov-21-2025, 04:08:25 GMT

Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) algorithms have become increasingly popular for Bayesian inference in large-scale applications.

artificial intelligence, extrapolation, machine learning, (19 more...)

Neural Information Processing Systems

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate

Zhu, Meng, Xiao, Quan, Min, Weidong

arXiv.org Machine LearningNov-21-2025

Since the 21st century, artificial intelligence has been leading a new round of industrial revolution. Under the training framework, the optimization algorithm aims to stably converge high-dimensional optimization to local and even global minima. Entering the era of large language models, although the scale of model parameters and data has increased, Adam remains the mainstream optimization algorithm. However, compared with stochastic gradient descent (SGD) based optimization algorithms, Adam is more likely to converge to non-flat minima. To address this issue, the AdamNX algorithm is proposed. Its core innovation lies in the proposition of a novel type of second-order moment estimation exponential decay rate, which gradually weakens the learning step correction strength as training progresses, and degrades to momentum SGD in the stable training period, thereby improving the stability of training in the stable period and possibly enhancing generalization ability. Experimental results show that our second-order moment estimation exponential decay rate is better than the current second-order moment estimation exponential decay rate, and AdamNX can stably outperform Adam and its variants in terms of performance. Our code is open-sourced at https://github.com/mengzhu0308/AdamNX.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2511.13465

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > China > Jiangxi Province > Nanchang (0.05)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Almost Sure Convergence Analysis of Differentially Private Stochastic Gradient Methods

Mukherjee, Amartya, Liu, Jun

arXiv.org Artificial IntelligenceNov-21-2025

Abstract-- Differentially private stochastic gradient descent (DP-SGD) has become the standard algorithm for training machine learning models with rigorous privacy guarantees. Despite its widespread use, the theoretical understanding of its long-run behavior remains limited: existing analyses typically establish convergence in expectation or with high probability, but do not address the almost sure convergence of single trajectories. In this work, we prove that DP-SGD converges almost surely under standard smoothness assumptions, both in nonconvex and strongly convex settings, provided the step sizes satisfy some standard decaying conditions. Our analysis extends to momentum variants such as the stochastic heavy ball (DP-SHB) and Nesterov's accelerated gradient (DP-NAG), where we show that careful energy constructions yield similar guarantees. These results provide stronger theoretical foundations for differentially private optimization and suggest that, despite privacy-induced distortions, the algorithm remains pathwise stable in both convex and nonconvex regimes.

artificial intelligence, gradient, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.16587

Country: North America > Canada (0.28)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Descend or Rewind? Stochastic Gradient Descent Unlearning

Mu, Siqiao, Klabjan, Diego

arXiv.org Artificial IntelligenceNov-21-2025

Machine unlearning algorithms aim to remove the impact of selected training data from a model without the computational expenses of retraining from scratch. Two such algorithms are ``Descent-to-Delete" (D2D) and ``Rewind-to-Delete" (R2D), full-batch gradient descent algorithms that are easy to implement and satisfy provable unlearning guarantees. In particular, the stochastic version of D2D is widely implemented as the ``finetuning" unlearning baseline, despite lacking theoretical backing on nonconvex functions. In this work, we prove $(ε, δ)$ certified unlearning guarantees for stochastic R2D and D2D for strongly convex, convex, and nonconvex loss functions, by analyzing unlearning through the lens of disturbed or biased gradient systems, which may be contracting, semi-contracting, or expansive respectively. Our argument relies on optimally coupling the random behavior of the unlearning and retraining trajectories, resulting in a probabilistic sensitivity bound that can be combined with a novel relaxed Gaussian mechanism to achieve $(ε, δ)$ unlearning. We determine that D2D can yield tighter guarantees for strongly convex functions compared to R2D by relying on contraction to a unique global minimum. However, unlike D2D, R2D can achieve unlearning in the convex and nonconvex setting because it draws the unlearned model closer to the retrained model by reversing the accumulated disturbances.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.15983

Country:

North America > United States (0.46)
Europe > Austria (0.28)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Improving Iterative Gaussian Processes via Warm Starting Sequential Posteriors

Dong, Alan Yufei, Lin, Jihao Andreas, Hernández-Lobato, José Miguel

arXiv.org Machine LearningNov-21-2025

Scalable Gaussian process (GP) inference is essential for sequential decision-making tasks, yet improving GP scalability remains a challenging problem with many open avenues of research. This paper focuses on iterative GPs, where iterative linear solvers, such as conjugate gradients, stochastic gradient descent or alternative projections, are used to approximate the GP posterior. We propose a new method which improves solver convergence of a large linear system by leveraging the known solution to a smaller system contained within. This is significant for tasks with incremental data additions, and we show that our technique achieves speed-ups when solving to tolerance, as well as improved Bayesian optimisation performance under a fixed compute budget.

artificial intelligence, experiment, machine learning, (14 more...)

arXiv.org Machine Learning

2511.1634

Country: Europe (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

Conditional Generative Moment-Matching Networks

Neural Information Processing SystemsNov-20-2025, 23:47:51 GMT

Maximum mean discrepancy (MMD) has been successfully applied to learn deep generative models for characterizing a joint distribution of variables via kernel mean embedding. In this paper, we present conditional generative moment-matching networks (CGMMN), which learn a conditional distribution given some input variables based on a conditional maximum mean discrepancy (CMMD) criterion. The learning is performed by stochastic gradient descent with the gradient calculated by back-propagation. We evaluate CGMMN on a wide range of tasks, including predictive modeling, contextual generation, and Bayesian dark knowledge, which distills knowledge from a Bayesian model by learning a relatively small CGMMN student network. Our results demonstrate competitive performance in all the tasks.

conditional generative moment-matching network, electronic proceedings, name change, (3 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.62)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.62)

Add feedback