AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Learning and Transferring Sparse Contextual Bigrams with Linear Transformers

Neural Information Processing SystemsOct-9-2025, 21:04:13 GMT

However, the theoretical basis remains unclear.

arxiv preprint arxiv, gradient, transformer, (15 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Achieving Near-Optimal Convergence for Distributed Minimax Optimization with Adaptive Stepsizes

Neural Information Processing SystemsOct-9-2025, 20:56:00 GMT

Sharma et al. (2022) provide Y ang et al. (2022a) integrate Local SGDA with stochastic gradient estimators to eliminate the More recently, Zhang et al. (2023) adopt compressed momentum methods with Local SGD to increase the communication efficiency of the algorithm. For centralized nonconvex minimax problems, Y ang et al. (2022b) show that, even in deterministic settings, GDA-based methods necessitate the timescale separation of the stepsizes for primal and dual updates.

algorithm, experiment, stepsize, (15 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States (0.14)
Asia > China (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.92)

Industry:

Information Technology (0.45)
Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Implicit Regularization of Decentralized Gradient Descent for Decentralized Sparse Regression

Neural Information Processing SystemsOct-9-2025, 20:21:58 GMT

We consider learning a sparse model from linear measurements taken by a network of agents.

hypothesis, inequality, regularization, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

1b96f01343ff10150e6719eb163e1536-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 20:05:04 GMT

algorithm, stability, theorem 5, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Russia > Siberian Federal District > Novosibirsk Oblast > Novosibirsk (0.04)
Asia > China > Shandong Province (0.04)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

Annealed Multiple Choice Learning: Overcoming limitations of Winner-takes-all with annealing David Perera

Neural Information Processing SystemsOct-9-2025, 19:02:50 GMT

We introduce Annealed Multiple Choice Learning (aMCL) which combines simulated annealing with MCL.

amcl, dataset, hypothesis, (15 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Industry: Education (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Speech (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback

Stochastic Optimization Schemes for Performative Prediction with Nonconvex Loss

Neural Information Processing SystemsOct-9-2025, 18:43:11 GMT

This paper studies a risk minimization problem with decision dependent data distribution.

deployment scheme, performative prediction, sgd-gd, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
Asia > Middle East > Republic of Türkiye > Samsun Province > Samsun (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Security & Privacy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)

Add feedback

Accelerating SGD for Highly Ill-Conditioned Huge-Scale Online Matrix Completion

Gavin Zhang, University of Illinois at Urbana–Champaign, jialun2@illinois.edu, "3026 Hong-Ming Chiu, University of Illinois at Urbana–Champaign, hmchiu2@illinois.edu, "3026 Richard Y. Zhang, University of Illinois at Urbana–Champaign, ryz@illinois.edu

Neural Information Processing SystemsOct-9-2025, 16:48:52 GMT

The matrix completion problem seeks to recover a d d ground truth matrix of low rank r d from observations of its individual elements. Real-world matrix completion is often a huge-scale optimization problem, with d so large that even the simplest full-dimension vector operations with O ( d) time complexity become prohibitively expensive. Stochastic gradient descent (SGD) is one of the few algorithms capable of solving matrix completion on a huge scale, and can also naturally handle streaming data over an evolving ground truth. Unfortunately, SGD experiences a dramatic slow-down when the underlying ground truth is ill-conditioned; it requires at least O ( κ log(1 /ϵ)) iterations to get ϵ -close to ground truth matrix with condition number κ. In this paper, we propose a preconditioned version of SGD that preserves all the favorable practical qualities of SGD for huge-scale online optimization while also making it agnostic to κ. For a symmetric ground truth and the Root Mean Square Error (RMSE) loss, we prove that the preconditioned SGD converges to ϵ -accuracy in O (log(1 /ϵ)) iterations, with a rapid linear convergence rate as if the ground truth were perfectly conditioned with κ = 1 . In our experiments, we observe a similar acceleration for item-item collaborative filtering on the MovieLens25M dataset via a pair-wise ranking loss, with 100 million training pairs and 10 million testing pairs.

artificial intelligence, iteration, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

A Sampling using Flows

Neural Information Processing SystemsOct-9-2025, 15:59:24 GMT

Neural transport augmented samplers have been subsequently extended by Hoffman et al. (2019) While, Duncan et al. (2019) have studied the Another contribution of this paper is learning equivariant Energy-Based Models using equivariant Stein variational gradient descent. Energy Based Models have witnessed a revival recently. As far as the authors are aware. Figure 8: Recommended to view in color . Translucent yellow dots represent the distribution.

artificial intelligence, machine learning, svgd, (16 more...)

Neural Information Processing Systems

Technology: