Do Counterfactual Examples Complicate Adversarial Training?

Yeats, Eric, Darwin, Cameron, Ortega, Eduardo, Liu, Frank, Li, Hai

Apr-17-2024–arXiv.org Artificial Intelligence

We leverage diffusion models to study the robustness-performance tradeoff of robust classifiers. Our approach introduces a simple, pretrained diffusion method to generate low-norm counterfactual examples (CEs): semantically altered data which results in different true class membership. We report that the confidence and accuracy of robust models on their clean training data are associated with the proximity of the data to their CEs. Moreover, robust models perform very poorly when evaluated on the CEs directly, as they become increasingly invariant to the low-norm, semantic changes brought by CEs. The results indicate a significant overlap between non-robust and semantic features, countering the common assumption that non-robust features are not interpretable.

artificial intelligence, classifier, machine learning, (13 more...)

arXiv.org Artificial Intelligence

Apr-17-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.47)

Genre:
- Research Report (1.00)

Industry:
- Government > Regional Government (0.47)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found