pascal-voc
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States (0.04)
- North America > Dominican Republic (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
Learning Mask-aware CLIP Representations for Zero-Shot Segmentation (Supplementary material) Anonymous Author(s) Affiliation Address email
In the supplementary material, we first introduce technical details of the "frozen CLIP" approaches in Sec. 1. Then the dataset settings are shown in Sec. 2. Figure 1 presents an overview of the "frozen CLIP" approach. It's worth noting that all sub-images are resized to Figure 2: Comparison among three merge operations. Pascal-VOC, COCO-Stuff and ADE20K, to evaluate the performance of MAFT. Pascal-VOC: There are 10582 images for training and 1,449 images for testing. ADE20K: ADE20K contains 25k images for training and 2k images for validation. Pascal-Context is an extensive dataset of Pascal-VOC 2010.
- North America > United States > Wisconsin > Dane County > Madison (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Greece (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States (0.04)
- North America > Dominican Republic (0.04)
- Asia > Middle East > Israel (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
Supplementary material for "GAMA: Generative Adversarial Multi-Object Scene Attacks "
We also demonstrate GAMA's transfer attack strength in comparison to prior methods under difficult black-box transfer attacks including in different multi-label distribution, object detection, and robustness of This can be seen in above embedding visualizations where GAMA's Surrogate and victim models are given in parenthesis. As can be seen in Table 3 and Table 4 (ensemble denoted as All), we do not observe any significant advantage in results when using multiple surrogates. GAMA is better than prior methods even when the victim pre-processes the perturbed image. We evaluated CLIP (as a "zero-shot prediction" model) on the perturbed images from Pascal-VOC and computed the top two associated labels in Figure 2 using CLIP's image-text aligning property. Pascal-VOC and computed the top-2 associated labels both for clean and perturbed images.