main

Sadhika Malladi

Neural Information Processing Systems 

Since decreasing LR along LSR converges to a different limit than SVAG does, it's natural to ask which part of the approximation in Lemma 4.7 fails for the former. By scrutinizing the proof of Lemma 4.7, we can see (i) and (ii) still hold for any stochastic discrete process with LR