Prompting the Unseen: Detecting Hidden Backdoors in Black-Box Models

Huang, Zi-Xuan, Chen, Jia-Wei, Zhang, Zhi-Peng, Yu, Chia-Mu

Nov-14-2024–arXiv.org Artificial Intelligence

Visual prompting (VP) is a new technique that adapts well-trained frozen models for source domain tasks to target domain tasks. This study examines VP's benefits for black-box model-level backdoor detection. The visual prompt in VP maps class subspaces between source and target domains. We identify a misalignment, termed class subspace inconsistency, between clean and poisoned datasets. Deep neural networks (DNNs) are commonly used in complex applications but require extensive computational power, leading to significant costs. However, DNNs can include backdoors (Gu et al., 2017; Liu et al., 2018b; Tang et al., 2021; Qi et al., 2023b; Nguyen & Tran, 2021; Chen et al., 2017), which manipulate model responses to inputs with specific triggers (like certain pixel patterns) while functioning correctly on other inputs. In backdoor attacks, attackers embed these triggers in the training data, leading the model to associate the trigger with a particular outcome and misclassify inputs containing it. Black-box backdoor detection, which uses only blackbox queries to the suspicious model (i.e., the model to be inspected), is gaining attention.

artificial intelligence, bp rom, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Nov-14-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report > Experimental Study (0.34)

Industry:
- Information Technology > Security & Privacy (1.00)
- Transportation > Air (0.81)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning
    - Neural Networks > Deep Learning (0.67)
    - Performance Analysis > Accuracy (0.67)
  - Security & Privacy (1.00)