Episodic Memories Generation and Evaluation Benchmark for Large Language Models
Huet, Alexis, Houidi, Zied Ben, Rossi, Dario
–arXiv.org Artificial Intelligence
Episodic memory -- the ability to recall specific events grounded in time and space -- is a cornerstone of human cognition, enabling not only coherent storytelling, but also planning and decision-making. Despite their remarkable capabilities, Large Language Models (LLMs) lack a robust mechanism for episodic memory: we argue that integrating episodic memory capabilities into LLM is essential for advancing AI towards human-like cognition, increasing their potential to reason consistently and ground their output in real-world episodic events, hence avoiding confabulations. To address this challenge, we introduce a comprehensive framework to model and evaluate LLM episodic memory capabilities. Drawing inspiration from cognitive science, we develop a structured approach to represent episodic events, encapsulating temporal and spatial contexts, involved entities, and detailed descriptions. We synthesize a unique episodic memory benchmark, free from contamination, and release open source code and datasets to assess LLM performance across various recall and episodic reasoning tasks. Our evaluation of state-of-the-art models, including GPT-4 and Claude variants, Llama 3.1, and o1-mini, reveals that even the most advanced LLMs struggle with episodic memory tasks, particularly when dealing with multiple related events or complex spatio-temporal relationships -- even in contexts as short as 10k-100k tokens.
arXiv.org Artificial Intelligence
Jan-20-2025
- Country:
- Asia > China
- Europe
- Denmark (0.04)
- France > Île-de-France
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- North America > United States
- California (0.04)
- New York > New York County
- New York City (0.04)
- Oceania > Australia
- New South Wales (0.04)
- South America
- Argentina > Pampas
- Buenos Aires F.D. > Buenos Aires (0.04)
- Buenos Aires Province (0.04)
- Chile > Santiago Metropolitan Region
- Santiago Province > Santiago (0.04)
- Argentina > Pampas
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine
- Consumer Health (1.00)
- Therapeutic Area > Neurology (0.93)
- Health & Medicine
- Technology: