Prompting the Unseen: Detecting Hidden Backdoors in Black-Box Models
Huang, Zi-Xuan, Chen, Jia-Wei, Zhang, Zhi-Peng, Yu, Chia-Mu
–arXiv.org Artificial Intelligence
Visual prompting (VP) is a new technique that adapts well-trained frozen models for source domain tasks to target domain tasks. This study examines VP's benefits for black-box model-level backdoor detection. The visual prompt in VP maps class subspaces between source and target domains. We identify a misalignment, termed class subspace inconsistency, between clean and poisoned datasets. Deep neural networks (DNNs) are commonly used in complex applications but require extensive computational power, leading to significant costs. However, DNNs can include backdoors (Gu et al., 2017; Liu et al., 2018b; Tang et al., 2021; Qi et al., 2023b; Nguyen & Tran, 2021; Chen et al., 2017), which manipulate model responses to inputs with specific triggers (like certain pixel patterns) while functioning correctly on other inputs. In backdoor attacks, attackers embed these triggers in the training data, leading the model to associate the trigger with a particular outcome and misclassify inputs containing it. Black-box backdoor detection, which uses only blackbox queries to the suspicious model (i.e., the model to be inspected), is gaining attention.
arXiv.org Artificial Intelligence
Nov-14-2024
- Country:
- Asia (0.28)
- Genre:
- Research Report > Experimental Study (0.34)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Transportation > Air (0.81)
- Technology: