TimeBench: A Comprehensive Evaluation of Temporal Reasoning Abilities in Large Language Models
Chu, Zheng, Chen, Jingchang, Chen, Qianglong, Yu, Weijiang, Wang, Haotian, Liu, Ming, Qin, Bing
–arXiv.org Artificial Intelligence
Understanding time is a pivotal aspect of human cognition, crucial in the broader framework of grasping the intricacies of the world. Previous studies typically focus on specific aspects of time, lacking a comprehensive temporal reasoning benchmark. To address this issue, we propose TimeBench, a comprehensive hierarchical temporal reasoning benchmark that covers a broad spectrum of temporal reasoning phenomena, which provides a thorough evaluation for investigating the temporal reasoning capabilities of large language models. We conduct extensive experiments on popular LLMs, such as GPT-4, LLaMA2, and Mistral, incorporating chain-of-thought prompting. Our experimental results indicate a significant performance gap between the state-of-the-art LLMs and humans, highlighting that there is still a considerable distance to cover in temporal reasoning. We aspire for TimeBench to serve as a comprehensive benchmark, fostering research in temporal reasoning for LLMs. Our resource is available at https://github.com/zchuz/TimeBench
arXiv.org Artificial Intelligence
Nov-29-2023
- Country:
- Africa
- Ethiopia > Addis Ababa
- Addis Ababa (0.04)
- Middle East > Egypt (0.04)
- Namibia > Khomas
- Windhoek (0.04)
- Rwanda > Kigali
- Kigali (0.04)
- Ethiopia > Addis Ababa
- Asia
- China
- Beijing > Beijing (0.04)
- Guangdong Province > Guangzhou (0.04)
- Heilongjiang Province > Harbin (0.04)
- Hong Kong (0.04)
- Hubei Province > Wuhan (0.04)
- Shaanxi Province > Xi'an (0.04)
- Zhejiang Province > Hangzhou (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- China
- Europe
- Austria (0.04)
- Czechia > Prague (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Italy > Tuscany
- Florence (0.04)
- Sweden > Uppsala County
- Uppsala (0.04)
- United Kingdom (0.14)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- California > Santa Clara County
- Stanford (0.04)
- Colorado > Denver County
- Denver (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- California > Santa Clara County
- Canada > Ontario
- Oceania > Australia
- Queensland (0.04)
- Africa
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Education (0.68)
- Government (1.00)
- Technology: