S)GD over Diagonal Linear Networks Implicit Bias Large and Edge of Stability
–Neural Information Processing Systems
Currently, most theoretical works on implicit regularisation have primarily focused on continuous time approximations of (S)GD where the impact of crucial hyperparameters such as the stepsize and the minibatch size are ignored. One such common simplification is to analyse gradient flow, which is a continuous time limit of GD and minibatch SGD with an infinitesimal stepsize. By definition, this analysis does not capture the effect of stepsize or stochasticity.
Neural Information Processing Systems
Feb-12-2026, 10:13:19 GMT
- Country:
- Asia > Russia (0.04)
- Europe
- Russia (0.04)
- Switzerland > Vaud
- Lausanne (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > United States
- Washington (0.04)
- Technology: