CAST: Cross-modal Alignment Similarity Test for Vision Language Models

Open in new window