Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability

Open in new window