Through the Looking Glass: Common Sense Consistency Evaluation of Weird Images

Rykov, Elisei, Petrushina, Kseniia, Titova, Kseniia, Razzhigaev, Anton, Panchenko, Alexander, Konovalov, Vasily

May-13-2025–arXiv.org Artificial Intelligence

Measuring how real images look is a complex task in artificial intelligence research. For example, an image of a boy with a vacuum cleaner in a desert violates common sense. We introduce a novel method, which we call Through the Looking Glass (TLG), to assess image common sense consistency using Large Vision-Language Models (LVLMs) and Transformer-based encoder. By leveraging LVLMs to extract atomic facts from these images, we obtain a mix of accurate facts. We proceed by fine-tuning a compact attention-pooling classifier over encoded atomic facts. Our TLG has achieved a new state-of-the-art performance on the WHOOPS! and WEIRD datasets while leveraging a compact fine-tuning component.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

May-13-2025

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States (0.94)
- Asia (0.68)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.90)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found