AITopics | Gradient Descent

Inthis section, we provide theoretical analysis ofHSPG. Moreover, we further point out that: (1) theSub-gradient Descent Stepwe used to achieve a "close enough" solution canbereplaced byothermethods, and(2)theAssumption 4isonlyasufficientcondition thatwecouldusetoshowthe"closeenough"condition. B.1 RelatedWork Problem (12)has been well studied indeterministic optimization with various algorithms that are capable ofreturning solutions with both lowobjectivevalueandhigh group sparsity under proper λ(95;73;42;64). For example, proximal stochastic variance-reduced gradient method (Prox-SVRG)(88)and proximal spider (Prox-Spider) (97) are developed to adopt multi-stage schemes based on the well-known variance reduction technique SVRG proposed in (46) and Spider developed in (22) respectively. Under Assumption 1, the search directiondk is a descent direction forψBk(xk), i.e., d>k ψBk(xk)<0.

artificial intelligence, gk 2, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

2cf153951b5e9b39564fc4a0ef6adc1a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 08:15:21 GMT

bregman divergence, convergence, convex, (12 more...)

Neural Information Processing Systems

Country:

Asia > Vietnam (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > Experimental Study (0.92)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.51)

Add feedback

Random Reshuffling: Simple Analysis with Vast Improvements

Neural Information Processing SystemsFeb-10-2026, 08:14:44 GMT

Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its sibling Stochastic Gradient Descent (SGD), RR is usually faster in practice and enjoys significant popularity in convex and non-convex optimization. The convergence rate of RR has attracted substantial attention recently and, for strongly convex and smooth functions, it was shown to converge faster than SGD if 1) the stepsize is small, 2) the gradients are bounded, and 3) the number of epochs is large.

artificial intelligence, machine learning, variance, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Saudi Arabia > Mecca Province > Thuwal (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(6 more...)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Add feedback

Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth Games: Convergence Analysis under Expected Co-coercivity

Anonymous

Neural Information Processing SystemsFeb-10-2026, 07:18:53 GMT

While our presentation focuses on this finite-sum structure, most of our convergence results can easily be adapted to the general stochastic setting (see App. D).

artificial intelligence, assumption, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)
Europe > Russia (0.04)
(2 more...)

Genre:

Research Report (0.46)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

9f96f36b7aae3b1ff847c26ac94c604e-Paper.pdf

Anonymous

Neural Information Processing SystemsFeb-10-2026, 07:18:50 GMT

algorithm, assumption, convergence, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.05)
Asia > Middle East > Jordan (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

9d27fdf2477ffbff837d73ef7ae23db9-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 06:03:22 GMT

cagrad, objective, pcgrad, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Add feedback

c336346c777707e09cab2a3c79174d90-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 04:26:55 GMT

We also establish new convergence complexities to achieve an approximate KKT solution when the objective can be smooth/nonsmooth, deterministic/stochastic and convex/nonconvex with complexity that is on a par with gradient descent for unconstrained optimization problems in respective cases. To the best of our knowledge, this is the first study of the first-order methods with complexity guarantee for nonconvex sparse-constrained problems.

artificial intelligence, machine learning, xk 1, (17 more...)

Neural Information Processing Systems

Country: