S)GD over Diagonal Linear Networks Implicit Bias Large and Edge of Stability

Feb-12-2026, 10:13:19 GMT–Neural Information Processing Systems

Currently, most theoretical works on implicit regularisation have primarily focused on continuous time approximations of (S)GD where the impact of crucial hyperparameters such as the stepsize and the minibatch size are ignored. One such common simplification is to analyse gradient flow, which is a continuous time limit of GD and minibatch SGD with an infinitesimal stepsize. By definition, this analysis does not capture the effect of stepsize or stochasticity.

artificial intelligence, machine learning, stepsize, (15 more...)

Neural Information Processing Systems

Feb-12-2026, 10:13:19 GMT

Conferences PDF

Add feedback

Country:
- Asia > Russia (0.04)
- North America > United States
  - Washington (0.04)
- Europe
  - Russia (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Switzerland > Vaud
    - Lausanne (0.04)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.93)
  - Machine Learning
    - Neural Networks (0.93)
    - Statistical Learning (0.69)

Duplicate Docs Excel Report

Title
S)GD over Diagonal Linear Networks Implicit Bias Large and Edge of Stability

Similar Docs Excel Report more

Title	Similarity	Source
None found