Intra-Modal Proxy Learning for Zero-Shot Visual Categorization with CLIP

Neural Information Processing Systems 

Vision-language pre-training methods, e.g., CLIP, demonstrate an impressive zero-shot performance on visual categorizations with the class proxy from the text embedding of the class name.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found