S)GD over Diagonal Linear Networks Implicit Bias Large and Edge of Stability

Neural Information Processing Systems 

Currently, most theoretical works on implicit regularisation have primarily focused on continuous time approximations of (S)GD where the impact of crucial hyperparameters such as the stepsize and the minibatch size are ignored. One such common simplification is to analyse gradient flow, which is a continuous time limit of GD and minibatch SGD with an infinitesimal stepsize. By definition, this analysis does not capture the effect of stepsize or stochasticity.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found