Explaining Similarity in Vision-Language Encoders with Weighted Banzhaf Interactions