Goto

Collaborating Authors

 pascal-voc



Learning Mask-aware CLIP Representations for Zero-Shot Segmentation (Supplementary material) Anonymous Author(s) Affiliation Address email

Neural Information Processing Systems

In the supplementary material, we first introduce technical details of the "frozen CLIP" approaches in Sec. 1. Then the dataset settings are shown in Sec. 2. Figure 1 presents an overview of the "frozen CLIP" approach. It's worth noting that all sub-images are resized to Figure 2: Comparison among three merge operations. Pascal-VOC, COCO-Stuff and ADE20K, to evaluate the performance of MAFT. Pascal-VOC: There are 10582 images for training and 1,449 images for testing. ADE20K: ADE20K contains 25k images for training and 2k images for validation. Pascal-Context is an extensive dataset of Pascal-VOC 2010.









Supplementary material for "GAMA: Generative Adversarial Multi-Object Scene Attacks "

Neural Information Processing Systems

We also demonstrate GAMA's transfer attack strength in comparison to prior methods under difficult black-box transfer attacks including in different multi-label distribution, object detection, and robustness of This can be seen in above embedding visualizations where GAMA's Surrogate and victim models are given in parenthesis. As can be seen in Table 3 and Table 4 (ensemble denoted as All), we do not observe any significant advantage in results when using multiple surrogates. GAMA is better than prior methods even when the victim pre-processes the perturbed image. We evaluated CLIP (as a "zero-shot prediction" model) on the perturbed images from Pascal-VOC and computed the top two associated labels in Figure 2 using CLIP's image-text aligning property. Pascal-VOC and computed the top-2 associated labels both for clean and perturbed images.