AITopics | Gradient Descent

Fast Mixingof Stochastic Gradient Descent with Normalizationand Weight Decay

Neural Information Processing SystemsFeb-8-2026, 11:04:33 GMT

Under 2.1, 2.3, 5.1, 5.2 and 5.3, let x , ( 0) X , ( 0) xinit 2 U forall , >0 for SGD+WD(2) and SDE(3).

artificial intelligence, arxivpreprintarxiv, machine learning, (11 more...)

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Add feedback

ASingle-Loop Smoothed Gradient Descent-Ascent Algorithmfor Nonconvex-Concave Min-Max Problems

Neural Information Processing SystemsFeb-8-2026, 10:55:40 GMT

Inotherfi( ) with y i > 0 at (x ,y ) contains thesolution.I+(y )torepresenty i > 0. amildassumption Assumption Forany(x ,y )satisfying(3.5), In t f forant (see thedecrease t.

artificial intelligence, arxiv, machine learning, (11 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Asia > Middle East > Jordan (0.05)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.41)

Add feedback

442b548e816f05640dec68f497ca38ac-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 09:57:54 GMT

augmentation, convergence, data augmentation, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

14fef58f09f2ebe69306e0a322e3be2b-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 08:16:59 GMT

adjacency, replacement, theorem 3, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Arizona (0.04)
North America > United States > Texas (0.04)
North America > United States > Florida > Sarasota County > Sarasota (0.04)
Europe > Italy > Sicily > Palermo (0.04)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.92)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Add feedback

A Proof of Theorem 1, w t, and w

Neural Information Processing SystemsFeb-8-2026, 07:17:07 GMT

Let ŵ be this arg min, which is unique since the objective is strongly convex. Substituting the definition of p and rearranging completes the proof. Lemma 2. Let l(; z) be H-smooth, convex, and non-negative for each z, let the stochastic gradient For the first term on the right hand side, we note that due to the algorithm's projections, all of the Lemma 3. Let l(; z) be H-smooth and non-negative for all z and let L This follows almost immediately from [Theorem 2.1.5 This proof is based on similar ideas as the proof of Lemma 5 and Theorem 2 due to Lan [17]. The key difference is that Lan considers a setting in which the variance of the stochastic gradients are uniformly bounded, while in our setting, we do not directly assume any bound on this quantity.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Add feedback

1456560769bbc38e4f8c5055048ea712-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 06:45:57 GMT

amcl, dataset, hypothesis, (14 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Speech (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Add feedback

15ce36d35622f126f38e90167de1a350-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 05:26:34 GMT

artificial intelligence, gradient descent, machine learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > Quebec > Montreal (0.04)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.32)

Add feedback

3f8b2a81da929223ae025fcec26dde0d-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 04:44:46 GMT

algorithm, duality gap, min-max problem, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Iowa (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Washington (0.04)
(3 more...)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)

Add feedback

326a8c055c0d04f5b06544665d8bb3ea-Supplemental.pdf

Neural Information Processing SystemsFeb-8-2026, 03:56:35 GMT

angular update, effective learning rate, weight norm, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

A Detailed comparisons with related work

Neural Information Processing SystemsFeb-8-2026, 03:16:40 GMT

In Table 1, we compare our agnostic learning results. Our results in this setting come from Theorem 3.3. We note that the sample complexity for Diakonikolas et al. To prove Lemma 3.5, we use the following result of Y ehudai and Shamir [35]. We first consider the case when σ satisfies Assumption 3.1.

artificial intelligence, machine learning, nullx null 2 2, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.31)

Add feedback

Filters

Collaborating Authors

Gradient Descent

Fast Mixingof Stochastic Gradient Descent with Normalizationand Weight Decay

ASingle-Loop Smoothed Gradient Descent-Ascent Algorithmfor Nonconvex-Concave Min-Max Problems

442b548e816f05640dec68f497ca38ac-Paper.pdf

14fef58f09f2ebe69306e0a322e3be2b-Paper-Conference.pdf

A Proof of Theorem 1, w t, and w

1456560769bbc38e4f8c5055048ea712-Paper-Conference.pdf

15ce36d35622f126f38e90167de1a350-Paper-Conference.pdf

3f8b2a81da929223ae025fcec26dde0d-Paper.pdf

326a8c055c0d04f5b06544665d8bb3ea-Supplemental.pdf

A Detailed comparisons with related work