episodic memory
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.05)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- (7 more...)
- Research Report (0.68)
- Instructional Material > Online (0.44)
- North America > United States > Iowa > Johnson County > Iowa City (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada (0.04)
- Information Technology (0.68)
- Health & Medicine > Consumer Health (0.55)
- Education > Educational Setting > Continuing Education (0.47)
Improved Schemes for Episodic Memory-based Lifelong Learning
Current deep neural networks can achieve remarkable performance on a single task. However, when the deep neural network is continually trained on a sequence of tasks, it seems to gradually forget the previous learned knowledge. This phenomenon is referred to as catastrophic forgetting and motivates the field called lifelong learning. Recently, episodic memory based approaches such as GEM and A-GEM have shown remarkable performance. In this paper, we provide the first unified view of episodic memory based approaches from an optimization's perspective.
Rhea: Role-aware Heuristic Episodic Attention for Conversational LLMs
Hong, Wanyang, Zhang, Zhaoning, Chen, Yi, Zhang, Libo, Liu, Baihui, Qiao, Linbo, Tian, Zhiliang, Li, Dongsheng
Large Language Models (LLMs) have achieved remarkable performance on single-turn tasks, yet their effectiveness deteriorates in multi-turn conversations. We define this phenomenon as cumulative contextual decay - a progressive degradation of contextual integrity caused by attention pollution, dilution, and drift. To address this challenge, we propose Rhea (Role-aware Heuristic Episodic Attention), a novel framework that decouples conversation history into two functionally independent memory modules: (1) an Instructional Memory (IM) that persistently stores high-fidelity global constraints via a structural priority mechanism, and (2) an Episodic Memory (EM) that dynamically manages user-model interactions via asymmetric noise control and heuristic context retrieval. During inference, Rhea constructs a high signal-to-noise context by applying its priority attention: selectively integrating relevant episodic information while always prioritizing global instructions. To validate this approach, experiments on multiple multi-turn conversation benchmarks - including MT-Eval and Long-MT-Bench+ - show that Rhea mitigates performance decay and improves overall accuracy by 1.04 points on a 10-point scale (a 16% relative gain over strong baselines). Moreover, Rhea maintains near-perfect instruction fidelity (IAR > 8.1) across long-horizon interactions. These results demonstrate that Rhea provides a principled and effective framework for building more precise, instruction-consistent conversational LLMs.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Asia > China > Hunan Province > Changsha (0.04)
From Memories to Maps: Mechanisms of In-Context Reinforcement Learning in Transformers
Humans and animals show remarkable learning efficiency, adapting to new environments with minimal experience. This capability is not well captured by standard reinforcement learning algorithms that rely on incremental value updates. Rapid adaptation likely depends on episodic memory -- the ability to retrieve specific past experiences to guide decisions in novel contexts. Transformers provide a useful setting for studying these questions because of their ability to learn rapidly in-context and because their key-value architecture resembles episodic memory systems in the brain. We train a transformer to in-context reinforcement learn in a distribution of planning tasks inspired by rodent behavior. We then characterize the learning algorithms that emerge in the model. We first find that representation learning is supported by in-context structure learning and cross-context alignment, where representations are aligned across environments with different sensory stimuli. We next demonstrate that the reinforcement learning strategies developed by the model are not interpretable as standard model-free or model-based planning. Instead, we show that in-context reinforcement learning is supported by caching intermediate computations within the model's memory tokens, which are then accessed at decision time. Overall, we find that memory may serve as a computational resource, storing both raw experience and cached computations to support flexible behavior. Furthermore, the representations developed in the model resemble computations associated with the hippocampal-entorhinal system in the brain, suggesting that our findings may be relevant for natural cognition. Taken together, our work offers a mechanistic hypothesis for the rapid adaptation that underlies in-context learning in artificial and natural settings.
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning
Yeo, Woongyeong, Kim, Kangsan, Yoon, Jaehong, Hwang, Sung Ju
Recent advances in video large language models have demonstrated strong capabilities in understanding short clips. However, scaling them to hours- or days-long videos remains highly challenging due to limited context capacity and the loss of critical visual details during abstraction. Existing memory-augmented methods mitigate this by leveraging textual summaries of video segments, yet they heavily rely on text and fail to utilize visual evidence when reasoning over complex scenes. Moreover, retrieving from fixed temporal scales further limits their flexibility in capturing events that span variable durations. To address this, we introduce WorldMM, a novel multimodal memory agent that constructs and retrieves from multiple complementary memories, encompassing both textual and visual representations. WorldMM comprises three types of memory: episodic memory indexes factual events across multiple temporal scales, semantic memory continuously updates high-level conceptual knowledge, and visual memory preserves detailed information about scenes. During inference, an adaptive retrieval agent iteratively selects the most relevant memory source and leverages multiple temporal granularities based on the query, continuing until it determines that sufficient information has been gathered. WorldMM significantly outperforms existing baselines across five long video question-answering benchmarks, achieving an average 8.4% performance gain over previous state-of-the-art methods, showing its effectiveness on long video reasoning.
- Europe > Austria > Vienna (0.14)
- Asia > India (0.04)
- North America > Canada > British Columbia > Vancouver (0.04)
- (2 more...)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
- Leisure & Entertainment (0.93)
- Media (0.68)
- Information Technology > Security & Privacy (0.67)
- Health & Medicine > Consumer Health (0.57)
Episodic Memory in Agentic Frameworks: Suggesting Next Tasks
Fiorini, Sandro Rama, Azevedo, Leonardo G., Thiago, Raphael M., de Sousa, Valesca M., Labate, Anton B., da Silva, Viviane Torres
Agentic frameworks powered by Large Language Models (LLMs) can be useful tools in scientific workflows by enabling human-AI co-creation. A key challenge is recommending the next steps during workflow creation without relying solely on LLMs, which risk hallucination and require fine-tuning with scarce proprietary data. We propose an episodic memory architecture that stores and retrieves past workflows to guide agents in suggesting plausible next tasks. By matching current workflows with historical sequences, agents can recommend steps based on prior patterns.
- Europe > Austria > Vienna (0.14)
- South America > Brazil (0.06)
- North America > United States (0.05)
- Europe > Switzerland (0.04)
EchoVLA: Robotic Vision-Language-Action Model with Synergistic Declarative Memory for Mobile Manipulation
Lin, Min, Liang, Xiwen, Lin, Bingqian, Jingzhi, Liu, Jiao, Zijian, Li, Kehan, Ma, Yuhan, Liu, Yuecheng, Zhao, Shen, Zhuang, Yuzheng, Liang, Xiaodan
Recent progress in Vision-Language-Action (VLA) models has enabled embodied agents to interpret multimodal instructions and perform complex tasks. However, existing VLAs are mostly confined to short-horizon, table-top manipulation, lacking the memory and reasoning capability required for long-horizon mobile manipulation, where agents must coordinate navigation and manipulation under changing spatial contexts. In this work, we present EchoVLA, a memory-aware VLA model for long-horizon mobile manipulation. EchoVLA incorporates a synergistic declarative memory inspired by the human brain, consisting of a scene memory that maintains a collection of spatial-semantic maps and an episodic memory that stores task-level experiences with multimodal contextual features. During both training and inference, the two memories are individually stored, updated, and retrieved based on current observations, task history, and instructions, and their retrieved representations are fused via coarse- and fine-grained attention to guide mobile-arm diffusion policies. To support large-scale training and evaluation, we further introduce MoMani, an automated benchmark that generates expert-level long-horizon trajectories through multimodal large language model (MLLM)-guided planning and feedback-driven refinement, supplemented with real-robot demonstrations. Experiments in simulated and real-world settings show that EchoVLA improves long-horizon performance, reaching 0.52 SR on manipulation/navigation and 0.31 on mobile manipulation, exceeding $π_{0.5}$ by +0.08 and +0.11.
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.48)
Cognitively-Inspired Episodic Memory Architectures for Accurate and Efficient Character AI
Gonzalez, Rafael Arias, DiPaola, Steve
Large language models show promise for embodying historical characters in dialogue systems, but existing approaches face a critical trade-off: simple retrieval-augmented generation produces shallow responses, while multi-stage reflection achieves depth at prohibitive latency. We present an architecture that resolves this tension through offline data augmentation and efficient parallel retrieval from structured episodic memory. Our system transforms biographical data into 1,774 enriched first-person memories with affective-semantic metadata, then employs two-stage retrieval achieving 0.52s prompt generation. Evaluation using LLM-as-judge and RAGAs metrics shows our approach achieves parity with traditional RAG on GPT-4 while significantly outperforming it on smaller models (GPT-3.5, GPT-3), suggesting particular value for resource-constrained deployments. Beyond dialogue, the structured memory enables novel visualization tools: spatiotemporal heatmaps, emotional trajectory analysis, and interactive path tracking, positioning the system as both a dialogue interface and research tool for biographical analysis. We use Van Gogh as a test case, but the architecture is generalizable to any historical figure with substantial textual records, offering a practical framework for educational, museum, and research applications requiring both accuracy and efficiency
- Europe > Netherlands > South Holland > The Hague (0.04)
- Europe > France (0.04)
- Europe > Belgium (0.04)
- Health & Medicine > Consumer Health (0.87)
- Information Technology > Services (0.68)