Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models

Open in new window