Universal Post-Training Reverse-Engineering Defense Against Backdoors in Deep Neural Networks
Li, Xi, Wang, Hang, Miller, David J., Kesidis, George
–arXiv.org Artificial Intelligence
A variety of defenses have been proposed against backdoors attacks on deep neural network (DNN) classifiers. Universal methods seek to reliably detect and/or mitigate backdoors irrespective of the incorporation mechanism used by the attacker, while reverse-engineering methods often explicitly assume one. In this paper, we describe a new detector that: relies on internal feature map of the defended DNN to detect and reverse-engineer the backdoor and identify its target class; can operate post-training (without access to the training dataset); is highly effective for various incorporation mechanisms (i.e., is universal); and which has low computational overhead and so is scalable. Our detection approach is evaluated for different attacks on a benchmark CIFAR-10 image classifier.
arXiv.org Artificial Intelligence
Feb-3-2024
- Country:
- North America > United States (0.14)
- Genre:
- Research Report (0.64)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: