Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models

Neural Information Processing Systems 

Imagine observing someone scratching their arm; to understand why, additional context would be necessary. However, spotting a mosquito nearby would immediately offer a likely explanation for the person's discomfort, thereby alleviating

Similar Docs  Excel Report  more

TitleSimilaritySource
None found