Explaining Away Attacks Against Neural Networks

Mar-6-2020–arXiv.org Machine Learning

We investigate the problem of identifying adversarial attacks on image-based neural networks. We present intriguing experimental results showing significant discrepancies between the explanations generated for the predictions of a model on clean and adversarial data. Utilizing this intuition, we propose a framework which can identify whether a given input is adversarial based on the explanations given by the model. Code for our experiments can be found here: https://github.com/seansaito/

adversarial attack, explanation, prediction, (14 more...)

arXiv.org Machine Learning

Mar-6-2020

arXiv.org PDF

Add feedback

Country:
- Asia > Singapore (0.05)
- North America > United States
  - California > Santa Clara County > Palo Alto (0.05)

Genre:
- Research Report > New Finding (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found