Vision Language Models See What You Want but not What You See

Gao, Qingying, Li, Yijiang, Lyu, Haiyun, Sun, Haoran, Luo, Dezhi, Deng, Hokin

Dec-22-2024–arXiv.org Artificial Intelligence

Knowing others' intentions and taking others' perspectives are two core components of human intelligence typically considered as instantiations of theory of mind. Infiltrating machines with these abilities is an important step towards building human-level artificial intelligence. We here investigate intentionality understanding and perspective-taking in Vision Language Models and, for the purpose, we have created IntentBench and PerspectBench datasets, which contain over 400 cognitive experiments grounded in real-world scenarios and classic cognitive tasks. Surprisingly, we find that VLMs achieve high performance in intentionality understanding but lower performance in perspective-taking using our two datasets. This challenges the common belief in the cognitive science literature that perspective-taking at the corresponding modality is necessary for intentionality understanding.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Dec-22-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - North Carolina (0.04)
  - Michigan (0.04)
  - New York > New York County
    - New York City (0.04)
  - Minnesota > Hennepin County
    - Minneapolis (0.04)
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - California > San Diego County
    - San Diego (0.04)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Education (0.46)
- Health & Medicine > Therapeutic Area
  - Neurology (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.73)
  - Machine Learning > Neural Networks
    - Deep Learning (0.73)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found