Appendix Yang Bo Department of Computing and Software Department of Computing and Software McMaster University

Neural Information Processing Systems 

As τ is learned from attention branch. From the Appendix A.1, we obtain the gradient of the sample-wise l Source code for the experiments is available in the zip file. All experiments are implemented in PyTorch and run in a single Nvidia A100 GPU. For CIFAR-10 and CIFAR-100, we do not perform early stopping since we don't assume the presence of clean validation data. All test accuracy are recorded from the last epoch of training.