Supplementary Material CAPro: Webly Supervised Learning with Cross-modality Aligned Prototypes
–Neural Information Processing Systems
S1.1 WebVision1k It contains 2.4M web images collected from Google and Flickr, which share the same 1k category names with ImageNet1k [1]. For each example, we use all available description, title, and tag in its metadata for raw text preparation. Besides, we follow [2, 3] to use the subset of WebVision-Google500 for ablation studies in consideration of lower GPU resource and time consumption without losing generalization. It contains 0.48M images from Google with randomly chosen 500 categories. The testing set of ImageNet1k and its subset ImageNet500 are involved as well for evaluation. S1.2 NUS-WIDE (Web) It contains 0.26M web images from Flickr with 5k unique user tags. Each example is manually annotated with multiple labels within 81 concepts that are filtered out of the 5k tags.
Neural Information Processing Systems
May-25-2025, 07:56:44 GMT