Training Instabilities Induce Flatness Bias in Gradient Descent

Open in new window