Explaining Similarity in Vision-Language Encoders with Weighted Banzhaf Interactions

Open in new window