Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult

Open in new window