A Experimental Setups A.1 Double descent phenomenon Following previous work [
–Neural Information Processing Systems
Accuracy curves of model trained using ERM. Figure 7: Accuracy curves of model trained on noisy CIFAR10 training set with 80% noise rate. For training, we use initial learning rate of 0.1, batch size of 128, 100 training epochs. We split the training set into two portions: 1) Untouched portion, i.e., the elements in the training set which were left untouched; 2) Corrupted portion, i.e., the elements in The learning rate is linearly increased from 0.0003 Following common practice, we use random resizing, cropping and flipping augmentation during training. However, they only analyzed the generalization errors in the presence of corrupted labels. This occurs around the epochs between underfitting and overfitting.
Neural Information Processing Systems
Aug-16-2025, 22:21:11 GMT