Universal Post-Training Reverse-Engineering Defense Against Backdoors in Deep Neural Networks

Li, Xi, Wang, Hang, Miller, David J., Kesidis, George

Feb-3-2024–arXiv.org Artificial Intelligence

A variety of defenses have been proposed against backdoors attacks on deep neural network (DNN) classifiers. Universal methods seek to reliably detect and/or mitigate backdoors irrespective of the incorporation mechanism used by the attacker, while reverse-engineering methods often explicitly assume one. In this paper, we describe a new detector that: relies on internal feature map of the defended DNN to detect and reverse-engineer the backdoor and identify its target class; can operate post-training (without access to the training dataset); is highly effective for various incorporation mechanisms (i.e., is universal); and which has low computational overhead and so is scalable. Our detection approach is evaluated for different attacks on a benchmark CIFAR-10 image classifier.

backdoor, backdoor pattern, perturbation, (16 more...)

arXiv.org Artificial Intelligence

Feb-3-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Pennsylvania (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
  - Canada > Ontario
    - Toronto (0.04)

Genre:
- Research Report (0.64)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)