AdversarialExamplesarenotBugs,theyareFeatures
–Neural Information Processing Systems
Wedemonstrate that adversarial examples can be directly attributed to the presence of non-robust features: features (derived from patterns in the data distribution) that are highly predictive, yet brittle and (thus) incomprehensible to humans. After capturing these features within a theoretical framework, we establish their widespread existence in standard datasets.
Neural Information Processing Systems
Feb-14-2026, 17:21:24 GMT