Time Awareness in Large Language Models: Benchmarking Fact Recall Across Time

Herel, David, Bartek, Vojtech, Mikolov, Tomas

Sep-20-2024–arXiv.org Artificial Intelligence

Who is the US President? The answer changes depending on when the question is asked. While large language models (LLMs) are evaluated on various reasoning tasks, they often miss a crucial dimension: time. In real-world scenarios, the correctness of answers is frequently tied to temporal context. In this paper, we introduce a novel dataset designed to rigorously test LLMs' ability to handle time-sensitive facts. Our benchmark offers a systematic way to measure how well LLMs align their knowledge with the correct time context, filling a key gap in current evaluation methods and offering a valuable tool for improving real-world applicability in future models.

accuracy, dataset, probability, (16 more...)

arXiv.org Artificial Intelligence

Sep-20-2024

arXiv.org PDF

Add feedback

Country:
- Africa > Mali (0.04)
- North America > United States
  - Washington > King County
    - Seattle (0.04)
  - Florida > Miami-Dade County
    - Miami (0.04)
- Europe
  - Czechia > Prague (0.05)
  - United Kingdom (0.04)
  - Monaco (0.04)
- Asia
  - British Indian Ocean Territory > Diego Garcia (0.04)
  - Middle East
    - Jordan (0.04)
    - Saudi Arabia > Asir Province
      - Abha (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Government > Regional Government > North America Government > United States Government (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.30)