Human-like Episodic Memory for Infinite Context LLMs

Fountas, Zafeirios, Benfeghoul, Martin A, Oomerjee, Adnan, Christopoulou, Fenia, Lampouras, Gerasimos, Bou-Ammar, Haitham, Wang, Jun

arXiv.org Artificial Intelligence 

Large language models (LLMs) have shown remarkable capabilities, but still struggle with processing extensive contexts, limiting their ability to maintain coherence and accuracy over long sequences. In this work, we introduce EM-LLM, a novel approach that integrates key aspects of human episodic memory and event cognition into LLMs, enabling them to effectively handle practically infinite context lengths while maintaining computational efficiency. When needed, these events are retrieved through a two-stage memory process, combining similarity-based and temporally contiguous retrieval for efficient and human-like access to relevant information. Experiments on the LongBench dataset demonstrate EM-LLM's superior performance, outperforming the state-of-the-art InfLLM model with an overall relative improvement of 4.3% across various tasks, including a 33% improvement on the PassageRetrieval task. Furthermore, our analysis reveals strong correlations between EM-LLM's event segmentation and human-perceived events, suggesting a bridge between this artificial system and its biological counterpart. This work not only advances LLM capabilities in processing extended contexts but also provides a computational framework for exploring human memory mechanisms, opening new avenues for interdisciplinary research in AI and cognitive science. For contemporary pre-trained large language models (LLMs), the context window serves as the primary mechanism to incorporate domain-specific, private, or common up-to-date information. These limitations stem from inherent challenges in Transformer-based architectures. Recent studies have shown that Transformers struggle with extrapolating to contexts longer than their training window size (Kazemnejad et al., 2024). On top of this, employing softmax attention over extended token sequences requires substantial computational resources for each token generation, and the resulting attention embeddings risk becoming excessively noisy and losing their distinctiveness (Tworkowski et al., 2023). To mitigate those challenges, recent works have focused on retrieval-based methods, either in the form of in-context augmentation (e.g., RAG-based techniques (Lewis et al., 2020; Gao et al., 2024)) or via retrieval of previously-inferred key-value pairs (KV) within individual attention heads (Wu et al., 2022; Tworkowski et al., 2023; Bertsch et al., 2023).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found