What's left can't be right -- The remaining positional incompetence of contrastive vision-language models

Open in new window