Corrupted labels Gaussian Random pixels Shuffled pixels

Neural Information Processing Systems 

Figure 7: Accuracy curves of model trained on noisy CIFAR10 training set with 80% noise rate. The horizontal dotted line displays the percentage of clean data in the training sets. It shows that our observations in Section 2 hold true even when extreme label noise injected. A.1 Double descent phenomenon Following previous work [12], we optimize all models using Adam [7] optimizer with fixed learning rate of 0.0001, batch size of 128, common data augmentation, weight decay of 0 for 4,000 epochs. A.2 Adversarial training [17] reported that imperceptible small perturbations around input data (i.e., adversarial examples) can cause ERM trained deep neural networks to make arbitrary predictions.