Idealised Bayesian Neural Networks Cannot Have Adversarial Examples: Theoretical and Empirical Study

Jun-5-2018–arXiv.org Machine Learning

We prove that idealised discriminative Bayesian neural networks, capturing perfect epistemic uncertainty, cannot have adversarial examples: Techniques for crafting adversarial examples will necessarily fail to generate perturbed images which fool the classifier. This suggests why MC dropout-based techniques have been observed to be fairly effective against adversarial examples. We support our claims mathematically and empirically. We experiment with HMC on synthetic data derived from MNIST for which we know the ground truth image density, showing that near-perfect epistemic uncertainty correlates to density under image manifold, and that adversarial images lie off the manifold. Using our new-found insights we suggest a new attack for MC dropout-based models by looking for imperfections in uncertainty estimation, and also suggest a mitigation. Lastly, we demonstrate our mitigation on a cats-vs-dogs image classification task with a VGG13 variant.

adversarial example, bayesian inference, neural network, (18 more...)

arXiv.org Machine Learning

Jun-5-2018

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.14)
- North America > Canada
  - Ontario > Toronto (0.14)

Genre:
- Research Report (0.50)

Industry:
- Information Technology > Security & Privacy (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (1.00)
    - Neural Networks (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found