Is Less More? Exploring Token Condensation as Training-free Adaptation for CLIP

Open in new window