AITopics | Gradient Descent

Collaborating Authors

Gradient Descent

News Overviews Instructional Materials AI-Alerts Classics

Convergence of Mean-field Langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction Taiji Suzuki 1,2, Denny Wu

Neural Information Processing SystemsFeb-9-2026, 20:00:42 GMT

Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures.

artificial intelligence, inequality, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.51)

Add feedback

8b9e7ab295e87570551db122a04c6f7c-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 19:00:18 GMT

Neural transport augmented sampling, firstintroduced byParnoandMarzouk (2018),isageneral method for using normalizing flows to sample from a given densityπ. Thus, samples can be generated fromπ(θ)by running MCMC chain in theZ-space and pushing these samples onto theΘ-space usingT. Neural transport augmented samplers havebeen subsequently extended by Hoffman etal. In this paper, we proposed equivariant Stein variational gradient descent algorithm for sampling fromdensities thatareinvarianttosymmetry transformations. Another contributionofourworkis subsequently using this equivariant sampling method to efficiently train equivariant energy based models forprobabilistic modeling andinference.

artificial intelligence, experiment, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions Wei Jiang 1, Sifan Y ang

Neural Information Processing SystemsFeb-9-2026, 18:17:05 GMT

Problem (1) has been comprehensively investigated in the literature [Duchi et al., 2011, Kingma and Ba, 2015, Loshchilov and Hutter, 2017], and it is well-known that the classical stochastic gradient descent (SGD) achieves a convergence rate of

artificial intelligence, convergence rate, machine learning, (18 more...)

Neural Information Processing Systems

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.67)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

6fee03d84375a159ecd3769ebbacae83-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 17:27:05 GMT

Convergence of stochastic gradient descent for non-smooth problems is a known result. For completeness, wereproduce and adapt ausual proof toour setting. Let us denote byF the class of functions fromX toY we are going to work with. Assumption 1 states that we have a well-specified modelF to estimate the median,i.e. Let us begin by controlling the estimation error.

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe > Hungary > Csongrád-Csanád County > Szeged (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

ActiveLabeling: StreamingStochasticGradients

Neural Information Processing SystemsFeb-9-2026, 17:27:02 GMT

The workhorse of machine learning is stochastic gradient descent. To access stochastic gradients, it is common to consider iteratively input/output pairs of a training dataset. Interestingly, it appears that one does not need full supervision to access stochastic gradients, which is the main motivation of this paper. After formalizing the"activelabeling" problem, whichfocuses onactivelearningwith partial supervision, we provide a streaming technique that provably minimizes the ratio of generalization error over the number of samples.

artificial intelligence, machine learning, supervision, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)

Add feedback

Considerminimizinganempiricalloss min

Neural Information Processing SystemsFeb-9-2026, 17:16:17 GMT

Many learning tasks, such as regression and classification, are usually framed that way [1]. When N 1, computing the gradient of the objective in(1) becomes a bottleneck, even if individual gradients θL(zi,θ) are cheap to evaluate. For a fixed computational budget, itisthustempting toreplace vanilla gradient descent bymore iterations but using anapproximate gradient, obtained using only afewdata points. Stochastic gradient descent (SGD; [2]) follows this template.

artificial intelligence, dpp, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Europe > France (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)

Add feedback

a576eafbce762079f7d1f77fca1c5cc2-Supplemental.pdf

Neural Information Processing SystemsFeb-9-2026, 16:27:47 GMT

continual learning approach, neural network, projection matrix, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

6db3ea527f53682657b3d6b02a841340-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 16:15:52 GMT

Westudy theasynchronous stochastic gradient descent algorithm fordistributed training overn workers which have varying computation and communication frequencyovertime.

artificial intelligence, gradient, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

6db3ea527f53682657b3d6b02a841340-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 16:15:48 GMT

Westudy theasynchronous stochastic gradient descent algorithm fordistributed training overn workers which have varying computation and communication frequencyovertime.

artificial intelligence, gradient, machine learning, (17 more...)

Neural Information Processing Systems

Country: