Weight fluctuations in (deep) linear neural networks and a derivation of the inverse-variance flatness relation

Open in new window