SpaNN: Detecting Multiple Adversarial Patches on CNNs by Spanning Saliency Thresholds
Victorica, Mauricio Byrd, Dán, György, Sandberg, Henrik
–arXiv.org Artificial Intelligence
This work has been accepted for publication in the IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). The final version will be available on IEEE Xplore. Abstract --State-of-the-art convolutional neural network models for object detection and image classification are vulnerable to physically realizable adversarial perturbations, such as patch attacks. Existing defenses have focused, implicitly or explicitly, on single-patch attacks, leaving their sensitivity to the number of patches as an open question or rendering them computationally infeasible or inefficient against attacks consisting of multiple patches in the worst cases. In this work, we propose SpaNN, an attack detector whose computational complexity is independent of the expected number of adversarial patches. The key novelty of the proposed detector is that it builds an ensemble of binarized feature maps by applying a set of saliency thresholds to the neural activations of the first convolutional layer of the victim model. It then performs clustering on the ensemble and uses the cluster features as the input to a classifier for attack detection. Contrary to existing detectors, SpaNN does not rely on a fixed saliency threshold for identifying adversarial regions, which makes it robust against white box adversarial attacks. We evaluate SpaNN on four widely used data sets for object detection and classification, and our results show that SpaNN outperforms state-of-the-art defenses by up to 11 and 27 percentage points in the case of object detection and the case of image classification, respectively. Our code is available at https://github.com/gerkbyrd/SpaNN . Deep learning models achieve state-of-the-art performance on computer vision tasks, but they are vulnerable to adversarial attacks, i.e., input perturbations crafted to change the model's output [1]-[3]. Several digital attack generation methods have been proposed in the past decade [1], [4]-[6], followed by corresponding defense schemes [7]-[10]. These attacks assume the adversary has direct access to the pixels of the input image provided to the model. In more recent years, focus has shifted towards physically realizable attacks [11]. They differ from digital attacks in that they are spatially constrained, and they typically involve applying a printable patch containing an adversarial pattern to an object in the physical scene.
arXiv.org Artificial Intelligence
Jun-25-2025
- Country:
- Europe (0.28)
- Genre:
- Research Report > New Finding (0.86)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Government > Military (1.00)
- Technology: