AITopics | Gradient Descent

This paper studies empirical risk minimization (ERM) problems for large-scale datasets and incorporates the idea of adaptive sample size methods to improve the guaranteed convergence bounds for first-order stochastic and deterministic methods. In contrast to traditional methods that attempt to solve the ERM problem corresponding to the full dataset directly, adaptive sample size schemes start with a small number of samples and solve the corresponding ERM problem to its statistical accuracy. The sample size is then grown geometrically - e.g., scaling by a factor of two - and use the solution of the previous ERM as a warm start for the new ERM. Theoretical analyses show that the use of adaptive sample size methods reduces the overall computational cost of achieving the statistical accuracy of the whole dataset for a broad range of deterministic and stochastic first-order methods. The gains are specific to the choice of method. When particularized to, e.g., accelerated gradient descent and stochastic variance reduce gradient, the computational cost advantage is a logarithm of the number of training samples. Numerical experiments on various datasets confirm theoretical claims and showcase the gains of using the proposed adaptive sample size scheme.

artificial intelligence, machine learning, statistical accuracy, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Nevada (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Non-convex Finite-Sum Optimization Via SCSG Methods

Lihua Lei, Cheng Ju, Jianbo Chen, Michael I. Jordan

Neural Information Processing SystemsNov-21-2025, 11:27:08 GMT

Moreover, SCSG is never worse than the state-of-the-art methods based on variance reduction and it significantly outperforms them when the target accuracy is low.

artificial intelligence, machine learning, scsg, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Massachusetts (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report (0.66)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent

Ben London

Neural Information Processing SystemsNov-21-2025, 11:23:29 GMT

We study the generalization error of randomized learning algorithms--focusing on stochastic gradient descent (SGD)--using a novel combination of P AC-Bayes and algorithmic stability. Importantly, our generalization bounds hold for all posterior distributions on an algorithm's random hyperparameters, including distributions that depend on the training data.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Gradient descent GAN optimization is locally stable

Vaishnavh Nagarajan, J. Zico Kolter

Neural Information Processing SystemsNov-21-2025, 11:21:27 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, equilibrium, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New Jersey (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback

QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, Milan Vojnovic

Neural Information Processing SystemsNov-21-2025, 10:33:31 GMT

Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to its excellent scalability properties.

artificial intelligence, deep learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Switzerland > Zürich > Zürich (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.73)

Add feedback

Convolutional Phase Retrieval

Qing Qu, Yuqian Zhang, Yonina Eldar, John Wright

Neural Information Processing SystemsNov-21-2025, 10:28:52 GMT

This model is motivated by applications to channel estimation, optics, and underwater acoustic communication, where the signal of interest is acted on by a given channel/filter, and phase information is difficult or impossible to acquire. We show that when a is random and m is sufficiently large, x can be efficiently recovered up to a global phase using a combination of spectral initialization and generalized gradient descent. The main challenge is coping with dependencies in the measurement operator; we overcome this challenge by using ideas from decoupling theory, suprema of chaos processes and the restricted isometry property of random circulant matrices, and recent analysis for alternating minimizing methods.

artificial intelligence, machine learning, retrieval, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Israel (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Barzilai-Borwein Step Size for Stochastic Gradient Descent

Conghui Tan, Shiqian Ma, Yu-Hong Dai, Yuqiu Qian

Neural Information Processing SystemsNov-21-2025, 09:51:38 GMT

SGD that can reduce the variance and improve the complexity.

artificial intelligence, machine learning, step size, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > China > Beijing > Beijing (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Add feedback

Accelerating Stochastic Composition Optimization

Mengdi Wang, Ji Liu, Ethan Fang

Neural Information Processing SystemsNov-21-2025, 08:03:09 GMT

The popular stochastic gradient methods are well suited for minimizing expected-value objective functions or the sum of a large number of loss functions. Stochastic gradient methods find wide applications in estimation, online learning, and training of deep neural networks.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Country: Europe (0.68)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

VAE Learning via Stein Variational Gradient Descent

Yuchen Pu, Zhe Gan, Ricardo Henao, Chunyuan Li, Shaobo Han, Lawrence Carin

Neural Information Processing SystemsNov-21-2025, 07:57:38 GMT

A new method for learning variational autoencoders (V AEs) is developed, based on Stein variational gradient descent. A key advantage of this approach is that one need not make parametric assumptions about the form of the encoder distribution. Performance is further enhanced by integrating the proposed encoder with importance sampling. Excellent performance is demonstrated across multiple unsupervised and semi-supervised problems, including semi-supervised analysis of the ImageNet data, demonstrating the scalability of the model to large datasets.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback