The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

Yu, Mo, Liu, Lemao, Wu, Junjie, Chung, Tsz Ting, Zhang, Shunchi, Li, Jiangnan, Yeung, Dit-Yan, Zhou, Jie

Feb-12-2025–arXiv.org Artificial Intelligence

In a systematic way, we investigate a widely asked question: Do LLMs really understand what they say?, which relates to the more familiar term Stochastic Parrot. To this end, we propose a summative assessment over a carefully designed physical concept understanding task, PhysiCo. Our task alleviates the memorization issue via the usage of grid-format inputs that abstractly describe physical phenomena. The grids represents varying levels of understanding, from the core phenomenon, application examples to analogies to other abstract patterns in the grid world. A comprehensive study on our task demonstrates: (1) state-of-the-art LLMs, including GPT-4o, o1 and Gemini 2.0 flash thinking, lag behind humans by ~40%; (2) the stochastic parrot phenomenon is present in LLMs, as they fail on our grid task but can describe and recognize the same concepts well in natural language; (3) our task challenges the LLMs due to intrinsic difficulties rather than the unfamiliar grid format, as in-context learning and fine-tuning on same formatted data added little to their performance.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Feb-12-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - UAE (0.14)
- North America
  - Canada (0.14)
  - United States (0.14)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Education (0.67)
- Energy > Oil & Gas
  - Upstream (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)