Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images
Rykov, Elisei, Petrushina, Kseniia, Titova, Kseniia, Razzhigaev, Anton, Panchenko, Alexander, Konovalov, Vasily
–arXiv.org Artificial Intelligence
Measuring how real images look is a complex task in artificial intelligence research. For example, an image of a boy with a vacuum cleaner in a desert violates common sense. We introduce a novel method, which we call Through the Looking Glass (TLG), to assess image common sense consistency using Large Vision-Language Models (LVLMs) and Transformer-based encoder. By leveraging LVLMs to extract atomic facts from these images, we obtain a mix of accurate facts. We proceed by fine-tuning a compact attention-pooling classifier over encoded atomic facts. Our TLG has achieved a new state-of-the-art performance on the WHOOPS! and WEIRD datasets while leveraging a compact fine-tuning component.
arXiv.org Artificial Intelligence
May-13-2025
- Country:
- Asia
- Europe
- France > Île-de-France
- Portugal > Lisbon
- Lisbon (0.04)
- Russia > Central Federal District
- Moscow Oblast > Moscow (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California > Los Angeles County
- Long Beach (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Washington > King County
- Seattle (0.04)
- California > Los Angeles County
- Canada > Ontario
- Genre:
- Research Report > New Finding (0.46)
- Technology: