Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP

Neural Information Processing Systems 

Large pretrained vision-language models like CLIP have shown promising generalization capability, but may struggle in specialized domains ( e.g., satellite imagery)

Similar Docs  Excel Report  more

TitleSimilaritySource
None found