Explaining Vision-Language Similarities in Dual Encoders with Feature-Pair Attributions

Open in new window