SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning
Xiang, Kun, Li, Heng, Zhang, Terry Jingchen, Huang, Yinya, Liu, Zirong, Qu, Peixin, He, Jixi, Chen, Jiaqi, Yuan, Yu-Jie, Han, Jianhua, Xu, Hang, Li, Hanhui, Sachan, Mrinmaya, Liang, Xiaodan
–arXiv.org Artificial Intelligence
We present SeePhys, a large-scale multimodal benchmark for LLM reasoning grounded in physics questions ranging from middle school to PhD qualifying exams. The benchmark covers 7 fundamental domains spanning the physics discipline, incorporating 21 categories of highly heterogeneous diagrams. In contrast to prior works where visual elements mainly serve auxiliary purposes, our benchmark features a substantial proportion of vision-essential problems (75%) that mandate visual information extraction for correct solutions. Through extensive evaluation, we observe that even the most advanced visual reasoning models (e.g., Gemini-2.5-pro and o4-mini) achieve sub-60% accuracy on our benchmark. These results reveal fundamental challenges in current large language models' visual understanding capabilities, particularly in: (i) establishing rigorous coupling between diagram interpretation and physics reasoning, and (ii) overcoming their persistent reliance on textual cues as cognitive shortcuts.
arXiv.org Artificial Intelligence
Oct-7-2025
- Country:
- Asia (0.93)
- North America > United States (0.28)
- Europe > Austria (0.28)
- Genre:
- Research Report > New Finding (0.67)
- Instructional Material > Course Syllabus & Notes (0.46)
- Industry:
- Education > Educational Setting
- Higher Education (0.46)
- K-12 Education (0.35)
- Education > Educational Setting
- Technology: