SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning
Xiang, Kun, Li, Heng, Zhang, Terry Jingchen, Huang, Yinya, Liu, Zirong, Qu, Peixin, He, Jixi, Chen, Jiaqi, Yuan, Yu-Jie, Han, Jianhua, Xu, Hang, Li, Hanhui, Sachan, Mrinmaya, Liang, Xiaodan
–arXiv.org Artificial Intelligence
We present SeePhys, a large-scale multimodal benchmark for LLM reasoning grounded in physics questions ranging from middle school to PhD qualifying exams. The benchmark covers 7 fundamental domains spanning the physics discipline, incorporating 21 categories of highly heterogeneous diagrams. In contrast to prior works where visual elements mainly serve auxiliary purposes, our benchmark features a substantial proportion of vision-essential problems (75%) that mandate visual information extraction for correct solutions. Through extensive evaluation, we observe that even the most advanced visual reasoning models (e.g., Gemini-2.5-pro and o4-mini) achieve sub-60% accuracy on our benchmark. These results reveal fundamental challenges in current large language models' visual understanding capabilities, particularly in: (i) establishing rigorous coupling between diagram interpretation and physics reasoning, and (ii) overcoming their persistent reliance on textual cues as cognitive shortcuts.
arXiv.org Artificial Intelligence
Oct-7-2025
- Country:
- Asia
- China > Hong Kong (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Singapore (0.04)
- Europe
- North America
- Canada > British Columbia
- Vancouver (0.04)
- United States
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Washington > King County
- Seattle (0.04)
- Louisiana > Orleans Parish
- Canada > British Columbia
- Asia
- Genre:
- Instructional Material > Course Syllabus & Notes (0.46)
- Research Report > New Finding (0.67)
- Industry:
- Education > Educational Setting
- Higher Education (0.46)
- K-12 Education (0.35)
- Education > Educational Setting
- Technology: