Well File:

 Sadhika Malladi


main

Neural Information Processing Systems

It is generally recognized that finite learning rate (LR), in contrast to infinitesimal LR, is important for good generalization in real-life deep nets. Most attempted explanations propose approximating finite-LR SGD with Itรด Stochastic Differential Equations (SDEs), but formal justification for this approximation (e.g., (Li et al., 2019a)) only applies to SGD with tiny LR. Experimental verification of the approximation appears computationally infeasible. The current paper clarifies the picture with the following contributions: (a) An efficient simulation algorithm SVAG that provably converges to the conventionally used Itรด SDE approximation.


main

Neural Information Processing Systems

Since decreasing LR along LSR converges to a different limit than SVAG does, it's natural to ask which part of the approximation in Lemma 4.7 fails for the former. By scrutinizing the proof of Lemma 4.7, we can see (i) and (ii) still hold for any stochastic discrete process with LR



main

Neural Information Processing Systems

Since decreasing LR along LSR converges to a different limit than SVAG does, it's natural to ask which part of the approximation in Lemma 4.7 fails for the former. By scrutinizing the proof of Lemma 4.7, we can see (i) and (ii) still hold for any stochastic discrete process with LR