Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers

Open in new window