If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions

Open in new window