A Proof of Theorem 1, w t, and w

Feb-8-2026, 07:17:07 GMT–Neural Information Processing Systems

Let ŵ be this arg min, which is unique since the objective is strongly convex. Substituting the definition of p and rearranging completes the proof. Lemma 2. Let l(; z) be H-smooth, convex, and non-negative for each z, let the stochastic gradient For the first term on the right hand side, we note that due to the algorithm's projections, all of the Lemma 3. Let l(; z) be H-smooth and non-negative for all z and let L This follows almost immediately from [Theorem 2.1.5 This proof is based on similar ideas as the proof of Lemma 5 and Theorem 2 due to Lan [17]. The key difference is that Lan considers a setting in which the variance of the stochastic gradients are uniformly bounded, while in our setting, we do not directly assume any bound on this quantity.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Feb-8-2026, 07:17:07 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

Duplicate Docs Excel Report

Title
3c63ec7be1b6c49e6c308397023fd8cd-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found