AITopics | Gradient Descent

In this study, we provide formal theoretical analysis where we derive explicit conditions for the step-size such that the metastability behavior of the discrete-time system is similar to its continuous-time limit.

exit time, noise, sgd, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States (0.04)
North America > Canada (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

Add feedback

Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks Yunwen Lei

Neural Information Processing SystemsAug-19-2025, 21:46:28 GMT

While significant theoretical progress has been achieved, unveiling the generalization mystery of overparameterized neural networks still remains largely elusive. In this paper, we study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and stochastic gradient descent (SGD) to train SNNs, for both of which we develop consistent excess risk bounds by balancing the optimization and generalization via early-stopping.

artificial intelligence, generalization, machine learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Hong Kong > Kowloon (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.78)

Add feedback

Zeroth-Order Negative Curvature Finding: Escaping Saddle Points without Gradients

Neural Information Processing SystemsAug-19-2025, 21:23:23 GMT

A recent line of work shows that, by adding uniform random perturbations, first-order (FO) methods can efficiently escape saddle points and converge to SOSP .

artificial intelligence, machine learning, saddle point, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation Jimmy Ba1, Murat A. Erdogdu 1, Taiji Suzuki

Neural Information Processing SystemsAug-19-2025, 20:48:30 GMT

We consider two scalings of the first step learning rate η . For small η, we establish a Gaussian equivalence property for the trained feature map, and prove that the learned kernel improves upon the initial random feature model, but cannot defeat the best linear model on the input.

artificial intelligence, machine learning, neural network, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Accelerating SGD for Highly Ill-Conditioned Huge-Scale Online Matrix Completion

Gavin Zhang, University of Illinois at Urbana–Champaign, jialun2@illinois.edu, "3026 Hong-Ming Chiu, University of Illinois at Urbana–Champaign, hmchiu2@illinois.edu, "3026 Richard Y. Zhang, University of Illinois at Urbana–Champaign, ryz@illinois.edu

Neural Information Processing SystemsAug-19-2025, 19:56:27 GMT

The matrix completion problem seeks to recover a d d ground truth matrix of low rank r d from observations of its individual elements. Real-world matrix completion is often a huge-scale optimization problem, with d so large that even the simplest full-dimension vector operations with O ( d) time complexity become prohibitively expensive. Stochastic gradient descent (SGD) is one of the few algorithms capable of solving matrix completion on a huge scale, and can also naturally handle streaming data over an evolving ground truth. Unfortunately, SGD experiences a dramatic slow-down when the underlying ground truth is ill-conditioned; it requires at least O ( log(1 /)) iterations to get -close to ground truth matrix with condition number . In this paper, we propose a preconditioned version of SGD that preserves all the favorable practical qualities of SGD for huge-scale online optimization while also making it agnostic to . For a symmetric ground truth and the Root Mean Square Error (RMSE) loss, we prove that the preconditioned SGD converges to -accuracy in O (log(1 /)) iterations, with a rapid linear convergence rate as if the ground truth were perfectly conditioned with =1 . In our experiments, we observe a similar acceleration for item-item collaborative filtering on the MovieLens25M dataset via a pair-wise ranking loss, with 100 million training pairs and 10 million testing pairs.

artificial intelligence, machine learning, matrix completion, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.05)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)

Add feedback

Global Convergence and Stability of Stochastic Gradient Descent

Neural Information Processing SystemsAug-19-2025, 16:04:17 GMT

In this work, we demonstrate the restrictiveness of these assumptions using three canonical models in machine learning.

artificial intelligence, assumption, machine learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > Ohio (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.93)

Add feedback

Implicit Regularization or Implicit Conditioning Exact Risk Trajectories of in High Dimensions

Neural Information Processing SystemsAug-19-2025, 16:02:05 GMT

Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of problems. While the empirical success of SGD is often attributed to its computational efficiency and favorable generalization behavior, neither effect is well understood and disentangling them remains an open problem.

artificial intelligence, machine learning, sgd, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > Russia (0.04)
Asia > Russia (0.04)
(5 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)

Add feedback

MNIST Partition 1 MNIST Partition 2 MNIST Partition 2 (Permuted Labels) CIFAR - 10 Equally Initialized Parameters Client Gradient Update Label Heterogeneous Domain Heterogeneous! Local Training Steps

Neural Information Processing SystemsAug-19-2025, 15:22:50 GMT

We extensively validate our method on both label-and domain-heterogeneous settings, on which it outperforms the state-of-the-art personalized federated learning methods.

artificial intelligence, federated learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Virginia (0.04)
Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Industry: Education (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.41)

Add feedback

On the Double Descent of Random Features Models Trained with SGD

Neural Information Processing SystemsAug-19-2025, 13:26:56 GMT

We study generalization properties of random features (RF) regression in high dimensions optimized by stochastic gradient descent (SGD) in under-/over-parameterized regime.

artificial intelligence, assumption, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback