Detecting Adversarial Examples and Other Misclassifications in Neural Networks by Introspection

May-22-2019–arXiv.org Machine Learning

Despite having excellent performances for a wide variety of tasks, modern neural networks are unable to provide a reliable confidence value allowing to detect misclassifications. This limitation is at the heart of what is known as an adversarial example, where the network provides a wrong prediction associated with a strong confidence to a slightly modified image. Moreover, this overconfidence issue has also been observed for regular errors and out-of-distribution data. We tackle this problem by what we call introspection, i.e. using the information provided by the logits of an already pretrained neural network. We show that by training a simple 3-layers neural network on top of the logit activations, we are able to detect misclassifications at a competitive level.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Machine Learning

May-22-2019

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Jordan (0.05)
- Europe > France
  - Île-de-France > Paris > Paris (0.04)
- North America > Canada
  - Ontario > Toronto (0.04)

Genre:
- Research Report (0.83)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Performance Analysis > Accuracy (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found