AMore on the background

Apr-24-2026, 13:30:38 GMT–Neural Information Processing Systems

A.1 SVRG and SCSG Here we provide the pseudocode for SVRG (Algorithm 2) and SCSG (Algorithm 3) seen in Lei et al. [35]. The idea of SVRG (Algorithm 2) is to reuses past full gradient computations (line 3) to reduce the variance of the current stochastic gradient estimate (line 7) before the parameter update (line 8). Note that N = 1 corresponds to a GD step (i.e., v SVRG achieves linear convergence O(1/T) using the semi-stochastic gradient. The key difference is that SCSG (Algorithm 3) considers a sequence of time-varying batch sizes (Bt and bt) and employs geometric sampling to generate the number of parameter update steps Nt in each iteration (line 6), instead of fixing the batch sizes and the number of updates as done in SVRG. Particularly when finding an -approximate solution (Definition 1) for optimizing smooth non-convex objectives, Lei et al. [35] proves that SCSG is never worse than SVRG in convergence rate and significantly outperforms SVRG when the requiredis small.

agent, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Apr-24-2026, 13:30:38 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.54)
  - Machine Learning (0.54)

Duplicate Docs Excel Report

Title
080acdcce72c06873a773c4311c2e464-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found