Learning Mask-aware CLIP Representations for Zero-Shot Segmentation (Supplementary material) Anonymous Author(s) Affiliation Address email

Feb-14-2026, 01:47:49 GMT–Neural Information Processing Systems

In the supplementary material, we first introduce technical details of the "frozen CLIP" approaches in Sec. 1. Then the dataset settings are shown in Sec. 2. Figure 1 presents an overview of the "frozen CLIP" approach. It's worth noting that all sub-images are resized to Figure 2: Comparison among three merge operations. Pascal-VOC, COCO-Stuff and ADE20K, to evaluate the performance of MAFT. Pascal-VOC: There are 10582 images for training and 1,449 images for testing. ADE20K: ADE20K contains 25k images for training and 2k images for validation. Pascal-Context is an extensive dataset of Pascal-VOC 2010.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Feb-14-2026, 01:47:49 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.50)
  - Natural Language > Large Language Model (0.45)

Duplicate Docs Excel Report

Title
6ffe484a646db13891bb6435ca39d667-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found