Temporal Blind Spots in Large Language Models
Wallat, Jonas, Jatowt, Adam, Anand, Avishek
–arXiv.org Artificial Intelligence
Large language models (LLMs) have recently gained significant attention due to their unparalleled ability to perform various natural language processing tasks. These models, benefiting from their advanced natural language understanding capabilities, have demonstrated impressive zero-shot performance. However, the pre-training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available\footnote{https://github.com/jwallat/temporalblindspots}.
arXiv.org Artificial Intelligence
Jan-22-2024
- Country:
- South America
- Colombia > Meta Department
- Villavicencio (0.04)
- Brazil > Rio de Janeiro
- Rio de Janeiro (0.04)
- Colombia > Meta Department
- Oceania > New Zealand
- North Island > Auckland Region > Auckland (0.04)
- North America
- United States
- Colorado (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Florida > Hillsborough County
- Tampa (0.04)
- California > San Francisco County
- San Francisco (0.14)
- Arizona > Maricopa County
- Tempe (0.04)
- Canada
- United States
- Europe
- Ireland (0.04)
- Italy > Tuscany
- Pisa Province > Pisa (0.04)
- Spain
- Galicia > Madrid (0.04)
- Catalonia > Barcelona Province
- Barcelona (0.04)
- Sweden > Uppsala County
- Uppsala (0.04)
- Austria > Tyrol
- Innsbruck (0.04)
- Middle East
- Cyprus (0.04)
- Republic of Türkiye > Istanbul Province
- Istanbul (0.04)
- Germany > Lower Saxony
- Hanover (0.04)
- Netherlands > South Holland
- Delft (0.04)
- France > Auvergne-Rhône-Alpes
- United Kingdom
- Scotland > City of Glasgow
- Glasgow (0.04)
- England > Greater London
- Scotland > City of Glasgow
- Asia
- Taiwan > Taiwan Province
- Taipei (0.04)
- South Korea > Seoul
- Seoul (0.04)
- Middle East
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Republic of Türkiye
- Istanbul Province > Istanbul (0.04)
- Ankara Province > Ankara (0.04)
- UAE > Abu Dhabi Emirate
- China
- Taiwan > Taiwan Province
- Africa > Ethiopia
- Addis Ababa > Addis Ababa (0.04)
- South America
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Technology: