Fixing Hackable Benchmarks for Vision-Language Compositionality

Open in new window