LOVM: Language-Only Vision Model Selection

Neural Information Processing Systems 

Pre-trained multi-modal vision-language models (VLMs) are becoming increasingly popular due to their exceptional performance on downstream vision applications, particularly in the few-and zero-shot settings.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found