generative adversarial multi-object scene attack
GAMA: Generative Adversarial Multi-Object Scene Attacks
The majority of methods for crafting adversarial attacks have focused on scenes with a single dominant object (e.g., images from ImageNet). On the other hand, natural scenes include multiple dominant objects that are semantically related. Thus, it is crucial to explore designing attack strategies that look beyond learning on single-object scenes or attack single-object victim classifiers. Due to their inherent property of strong transferability of perturbations to unknown models, this paper presents the first approach of using generative models for adversarial attacks on multi-object scenes. In order to represent the relationships between different objects in the input scene, we leverage upon the open-sourced pre-trained vision-language model CLIP (Contrastive Language-Image Pre-training), with the motivation to exploit the encoded semantics in the language space along with the visual space. We call this attack approach Generative Adversarial Multi-object Attacks (GAMA). GAMA demonstrates the utility of the CLIP model as an attacker's tool to train formidable perturbation generators for multi-object scenes. Using the joint image-text features to train the generator, we show that GAMA can craft potent transferable perturbations in order to fool victim classifiers in various attack settings. For example, GAMA triggers ~16% more misclassification than state-of-the-art generative approaches in black-box settings where both the classifier architecture and data distribution of the attacker are different from the victim.
- Information Technology > Security & Privacy (0.82)
- Government > Military (0.82)
Supplementary material for "GAMA: Generative Adversarial Multi-Object Scene Attacks "
We also demonstrate GAMA's transfer attack strength in comparison to prior methods under difficult black-box transfer attacks including in different multi-label distribution, object detection, and robustness of This can be seen in above embedding visualizations where GAMA's Surrogate and victim models are given in parenthesis. As can be seen in Table 3 and Table 4 (ensemble denoted as All), we do not observe any significant advantage in results when using multiple surrogates. GAMA is better than prior methods even when the victim pre-processes the perturbed image. We evaluated CLIP (as a "zero-shot prediction" model) on the perturbed images from Pascal-VOC and computed the top two associated labels in Figure 2 using CLIP's image-text aligning property. Pascal-VOC and computed the top-2 associated labels both for clean and perturbed images.
GAMA: Generative Adversarial Multi-Object Scene Attacks
The majority of methods for crafting adversarial attacks have focused on scenes with a single dominant object (e.g., images from ImageNet). On the other hand, natural scenes include multiple dominant objects that are semantically related. Thus, it is crucial to explore designing attack strategies that look beyond learning on single-object scenes or attack single-object victim classifiers. Due to their inherent property of strong transferability of perturbations to unknown models, this paper presents the first approach of using generative models for adversarial attacks on multi-object scenes. In order to represent the relationships between different objects in the input scene, we leverage upon the open-sourced pre-trained vision-language model CLIP (Contrastive Language-Image Pre-training), with the motivation to exploit the encoded semantics in the language space along with the visual space.
- Information Technology > Security & Privacy (0.84)
- Government > Military (0.84)