Goto

Collaborating Authors

 Scripts & Frames


Semantic and episodic memories in a predictive coding model of the neocortex

arXiv.org Artificial Intelligence

Complementary Learning Systems theory holds that intelligent agents need two learning systems. Semantic memory is encoded in the neocortex with dense, overlapping representations and acquires structured knowledge. Episodic memory is encoded in the hippocampus with sparse, pattern-separated representations and quickly learns the specifics of individual experiences. Recently, this duality between semantic and episodic memories has been challenged by predictive coding, a biologically plausible neural network model of the neocortex which was shown to have hippocampus-like abilities on auto-associative memory tasks. These results raise the question of the episodic capabilities of the neocortex and their relation to semantic memory. In this paper, we present such a predictive coding model of the neocortex and explore its episodic capabilities. We show that this kind of model can indeed recall the specifics of individual examples but only if it is trained on a small number of examples. The model is overfitted to these exemples and does not generalize well, suggesting that episodic memory can arise from semantic learning. Indeed, a model trained with many more examples loses its recall capabilities. This work suggests that individual examples can be encoded gradually in the neocortex using dense, overlapping representations but only in a limited number, motivating the need for sparse, pattern-separated representations as found in the hippocampus.


Episodic Memory Representation for Long-form Video Understanding

arXiv.org Artificial Intelligence

Video Large Language Models (Video-LLMs) excel at general video understanding but struggle with long-form videos due to context window limits. Consequently, recent approaches focus on keyframe retrieval, condensing lengthy videos into a small set of informative frames. Despite their practicality, these methods simplify the problem to static text image matching, overlooking spatio temporal relationships crucial for capturing scene transitions and contextual continuity, and may yield redundant keyframes with limited information, diluting salient cues essential for accurate video question answering. To address these limitations, we introduce Video-EM, a training free framework inspired by the principles of human episodic memory, designed to facilitate robust and contextually grounded reasoning. Rather than treating keyframes as isolated visual entities, Video-EM explicitly models them as temporally ordered episodic events, capturing both spatial relationships and temporal dynamics necessary for accurately reconstructing the underlying narrative. Furthermore, the framework leverages chain of thought (CoT) thinking with LLMs to iteratively identify a minimal yet highly informative subset of episodic memories, enabling efficient and accurate question answering by Video-LLMs. Extensive evaluations on the Video-MME, EgoSchema, HourVideo, and LVBench benchmarks confirm the superiority of Video-EM, which achieves highly competitive results with performance gains of 4-9 percent over respective baselines while utilizing fewer frames.


OSGNet @ Ego4D Episodic Memory Challenge 2025

arXiv.org Artificial Intelligence

All tracks require precise localization of the interval within an untrimmed egocentric video. Previous unified video localization approaches often rely on late fusion strategies, which tend to yield suboptimal results. T o address this, we adopt an early fusion-based video localization model to tackle all three tasks, aiming to enhance localization accuracy. Ultimately, our method achieved first place in the Natural Language Queries, Goal Step, and Moment Queries tracks, demonstrating its effectiveness.


Reviews: Episodic Memory in Lifelong Language Learning

Neural Information Processing Systems

The paper addresses the very important topic of lifelong learning, and it proposes to employ an episodic memory to avoid catastrophic forgetting. The memory is based on a key-value representation that exploits an encoder-decoder architecture based on BERT. The training is made on the concatenation of different datasets, of which there is no need to specify the identifiers. The work is highly significant and the novelty of the contribution is remarkable. One point that would have deserved more attention is the strategies for the reading and writing of the episodic memory (see also comments below).


Reviews: Episodic Memory in Lifelong Language Learning

Neural Information Processing Systems

This paper proposes the use of memory in life-long learning to prevent catastrophic forgetting by means of experience replay and local adaptation. The idea is simple yet it is an interesting new step in this line of work. The paper would be a good addition to the conference, and has support from reviewers.


From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents

arXiv.org Artificial Intelligence

We introduce CTIM-Rover, an AI agent for Software Engineering (SE) built on top of AutoCodeRover (Zhang et al., 2024) that extends agentic reasoning frameworks with an episodic memory, more specifically, a general and repository-level Cross-Task-Instance Memory (CTIM). While existing open-source SE agents mostly rely on ReAct (Yao et al., 2023b), Reflexion (Shinn et al., 2023), or Code-Act (Wang et al., 2024), all of these reasoning and planning frameworks inefficiently discard their long-term memory after a single task instance. As repository-level understanding is pivotal for identifying all locations requiring a patch for fixing a bug, we hypothesize that SE is particularly well positioned to benefit from CTIM. For this, we build on the Experiential Learning (EL) approach ExpeL (Zhao et al., 2024), proposing a Mixture-Of-Experts (MoEs) inspired approach to create both a general-purpose and repository-level CTIM. We find that CTIM-Rover does not outperform AutoCodeRover in any configuration and thus conclude that neither ExpeL nor DoT-Bank (Lingam et al., 2024) scale to real-world SE problems. Our analysis indicates noise introduced by distracting CTIM items or exemplar trajectories as the likely source of the performance degradation.


Learning Structured Representations with Hyperbolic Embeddings

Neural Information Processing Systems

Most real-world datasets consist of a natural hierarchy between classes or an inherent label structure that is either already available or can be constructed cheaply. However, most existing representation learning methods ignore this hierarchy, treating labels as permutation invariant. Recent work [Zeng et al., 2022] proposes using this structured information explicitly, but the use of Euclidean distance may distort the underlying semantic context [Chen et al., 2013]. In this work, motivated by the advantage of hyperbolic spaces in modeling hierarchical relationships, we propose a novel approach HypStructure: a Hyperbolic Structured regularization approach to accurately embed the label hierarchy into the learned representations. HypStructure is a simple-yet-effective regularizer that consists of a hyperbolic tree-based representation loss along with a centering loss, and can be combined with any standard task loss to learn hierarchy-informed features.


Linking In-context Learning in Transformers to Human Episodic Memory

Neural Information Processing Systems

Understanding connections between artificial and biological intelligent systems can reveal fundamental principles of general intelligence. While many artificial intelligence models have a neuroscience counterpart, such connections are largely missing in Transformer models and the self-attention mechanism. Here, we examine the relationship between interacting attention heads and human episodic memory. We focus on induction heads, which contribute to in-context learning in Transformer-based large language models (LLMs). We demonstrate that induction heads are behaviorally, functionally, and mechanistically similar to the contextual maintenance and retrieval (CMR) model of human episodic memory.


Building spatial world models from sparse transitional episodic memories

arXiv.org Artificial Intelligence

Many animals possess a remarkable capacity to rapidly construct flexible mental models of their environments. These world models are crucial for ethologically relevant behaviors such as navigation, exploration, and planning. The ability to form episodic memories and make inferences based on these sparse experiences is believed to underpin the efficiency and adaptability of these models in the brain. Here, we ask: Can a neural network learn to construct a spatial model of its surroundings from sparse and disjoint episodic memories? We formulate the problem in a simulated world and propose a novel framework, the Episodic Spatial World Model (ESWM), as a potential answer. We show that ESWM is highly sample-efficient, requiring minimal observations to construct a robust representation of the environment. It is also inherently adaptive, allowing for rapid updates when the environment changes. In addition, we demonstrate that ESWM readily enables near-optimal strategies for exploring novel environments and navigating between arbitrary points, all without the need for additional training.


Momentum Boosted Episodic Memory for Improving Learning in Long-Tailed RL Environments

arXiv.org Artificial Intelligence

Traditional Reinforcement Learning (RL) algorithms assume the distribution of the data to be uniform or mostly uniform. However, this is not the case with most real-world applications like autonomous driving or in nature where animals roam. Some experiences are encountered frequently, and most of the remaining experiences occur rarely; the resulting distribution is called Zipfian. Taking inspiration from the theory of complementary learning systems, an architecture for learning from Zipfian distributions is proposed where important long tail trajectories are discovered in an unsupervised manner. The proposal comprises an episodic memory buffer containing a prioritised memory module to ensure important rare trajectories are kept longer to address the Zipfian problem, which needs credit assignment to happen in a sample efficient manner. The experiences are then reinstated from episodic memory and given weighted importance forming the trajectory to be executed. Notably, the proposed architecture is modular, can be incorporated in any RL architecture and yields improved performance in multiple Zipfian tasks over traditional architectures. Our method outperforms IMPALA by a significant margin on all three tasks and all three evaluation metrics (Zipfian, Uniform, and Rare Accuracy) and also gives improvements on most Atari environments that are considered challenging