S)GD over Diagonal Linear Networks Implicit Bias Large and Edge of Stability

Open in new window