AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Using Statistics to Automate Stochastic Optimization

Neural Information Processing SystemsAug-20-2025, 06:19:12 GMT

Rather than changing the learning rate at each iteration, we propose an approach that automates the most common hand-tuning heuristic: use a constant learning rate until "progress stops", then drop. We design an explicit statistical test that determines when the dynamics of stochastic gradient descent reach a stationary distribution.

estimator, stationary distribution, variance, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre:

Research Report (0.46)
Instructional Material (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up

Dominic Richards, Patrick Rebeschini

Neural Information Processing SystemsAug-20-2025, 05:55:34 GMT

The presence of the threshold comes from statistics.

agent, gradient descent, iteration, (13 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Massachusetts (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

A Universally Optimal Multistage Accelerated Stochastic Gradient Method

Necdet Serhat Aybat, Alireza Fallah, Mert Gurbuzbalaban, Asuman Ozdaglar

Neural Information Processing SystemsAug-20-2025, 04:31:36 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, iteration, m-asg, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(6 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.51)

Add feedback

Dynamics of stochastic gradient descent for two-layer neural networks in the teacher-student setup

Sebastian Goldt, Madhu Advani, Andrew M. Saxe, Florent Krzakala, Lenka Zdeborová

Neural Information Processing SystemsAug-20-2025, 02:11:13 GMT

Deep neural networks behind state-of-the-art results in image classification and other domains have one thing in common: their size.

generalisation error, neural network, student, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report (0.68)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Stochastic Gradient Hamiltonian Monte Carlo Methods with Recursive Variance Reduction

Difan Zou, Pan Xu, Quanquan Gu

Neural Information Processing SystemsAug-20-2025, 01:27:48 GMT

We provide a convergence analysis of SRVR-HMC for sampling from a class of non-log-concave distributions and show that SRVR-HMC converges faster than all existing HMC-type algorithms based on underdamped Langevin dynamics.

algorithm, gradient complexity, langevin dynamic, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.29)
Asia > Middle East > Jordan (0.05)
North America > Canada (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks

Yuanzhi Li, Colin Wei, Tengyu Ma

Neural Information Processing SystemsAug-20-2025, 00:46:33 GMT

Neural Information Processing Systems http://nips.cc/

arxiv preprint arxiv, initial learning rate, learning rate, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

RSN: Randomized Subspace Newton

Neural Information Processing SystemsAug-20-2025, 00:39:47 GMT

We develop a randomized Newton method capable of solving learning problems with huge dimensional feature spaces, which is a common setting in applications such as medical imaging, genomics and seismology. Our method leverages randomized sketching in a new way, by finding the Newton direction constrained to the space spanned by a random sketch. We develop a simple global linear convergence theory that holds for practically all sketching techniques, which gives the practitioners the freedom to design custom sketching approaches suitable for particular applications. We perform numerical experiments which demonstrate the efficiency of our method as compared to accelerated gradient descent and the full Newton method. Our method can be seen as a refinement and randomized extension of the results of Karimireddy, Stich, and Jaggi [18].

algorithm 1, matrix, newton method, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Saudi Arabia (0.04)
North America > United States > Connecticut > New Haven County > New Haven (0.04)
North America > Canada (0.04)
(5 more...)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

the liberty to group and reword some of the reviewers comment (in blue italic) to save space. 3 General answer on the usefulness of gradient descent, its theoretical guarantees, and its scalability

Neural Information Processing SystemsAug-20-2025, 00:13:37 GMT

We thank the reviewers for the time they spent evaluating our manuscript and for their valuable comments. We agree that having theoretical guarantees would be a big plus. As for scalability, the bottleneck of our method is the single-linkage algorithm. Similarly to Monath et al. (NeurIPS 2017), our idea consists Given the significant body of additional material, we feel that this topic is best left to a future publication. Line 8,56,70,93: I would suggest a more cautious usage of the word "equivalent".

dasgupta, gradient descent, theoretical guarantee, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.44)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.33)

Add feedback

Explainable Learning Rate Regimes for Stochastic Optimization

Yang, Zhuang

arXiv.org Artificial IntelligenceAug-20-2025

Modern machine learning is trained by stochastic gradient descent (SGD), whose performance critically depends on how the learning rate (LR) is adjusted and decreased over time. Yet existing LR regimes may be intricate, or need to tune one or more additional hyper-parameters manually whose bottlenecks include huge computational expenditure, time and power in practice. This work, in a natural and direct manner, clarifies how LR should be updated automatically only according to the intrinsic variation of stochastic gradients. An explainable LR regime by leveraging stochastic second-order algorithms is developed, behaving a similar pattern to heuristic algorithms but implemented simply without any parameter tuning requirement, where it is of an automatic procedure that LR should increase (decrease) as the norm of stochastic gradients decreases (increases). The resulting LR regime shows its efficiency, robustness, and scalability in different classical stochastic algorithms, containing SGD, SGDM, and SIGNSGD, on machine learning tasks.

algorithm, artificial intelligence, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2508.13639

Country: Asia > China (0.28)

Genre: Research Report (0.83)

Industry: Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.97)

Add feedback