Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning

Open in new window