Reviews: Detecting Overfitting via Adversarial Examples
–Neural Information Processing Systems
The work addresses the issue of neural networks' overfitting to test sets on classification tasks due to widespread reuse of the same datasets throughout the community, and how that affects the credibility of reported test error rates, which should reflect performance on'truly new' data from the same distribution. The proposed test statistic does not affect the training procedure, and is simple in theory: if the (importance-reweighted) empirical risk and the empirical risk of adversarially-perturbed examples differs by more than a certain threshold (given by concentration bounds), the null hypothesis that the classifier and the test data are independent is rejected. My main concern is that the type of adversarial examples used, bounded translational shifts (for image data), is very limited and likely to be unrealistic. Effectively shifting the frame of a CIFAR image is quite different from swapping items in a scene; it is less subtle and less'insidious', unless perhaps a "7" is converted via truncation into a "1". It would have been nice to see example adversarial images for a sense of how they compare to the ones typically discussed in the literature, particularly as a selling point of the work is the use of adversarial examples.
Neural Information Processing Systems
Jan-22-2025, 11:09:27 GMT
- Technology: