Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles

Open in new window