Memory in Large Language Models: Mechanisms, Evaluation and Evolution
Zhang, Dianxing, Li, Wendong, Song, Kani, Lu, Jiaye, Li, Gang, Yang, Liuchun, Li, Sheng
–arXiv.org Artificial Intelligence
Under a unified operational definition, we define LLM memory as a persistent state written during pretraining, finetuning, or inference that can later be addressed and that stably influences outputs. We propose a four-part taxonomy (parametric, contextual, external, procedural/episodic) and a memory quadruple (location, persistence, write/access path, controllability). We link mechanism, evaluation, and governance via the chain write -> read -> inhibit/update. To avoid distorted comparisons across heterogeneous setups, we adopt a three-setting protocol (parametric only, offline retrieval, online retrieval) that decouples capability from information availability on the same data and timeline. On this basis we build a layered evaluation: parametric (closed-book recall, edit differential, memorization/privacy), contextual (position curves and the mid-sequence drop), external (answer correctness vs snippet attribution/faithfulness), and procedural/episodic (cross-session consistency and timeline replay, E MARS+). The framework integrates temporal governance and leakage auditing (freshness hits, outdated answers, refusal slices) and uncertainty reporting via inter-rater agreement plus paired tests with multiple-comparison correction. For updating and forgetting, we present DMM Gov: coordinating DAPT/TAPT, PEFT, model editing (ROME, MEND, MEMIT, SERAC), and RAG to form an auditable loop covering admission thresholds, rollout, monitoring, rollback, and change audits, with specs for timeliness, conflict handling, and long-horizon consistency. Finally, we give four testable propositions: minimum identifiability; a minimal evaluation card; causally constrained editing with verifiable forgetting; and when retrieval with small-window replay outperforms ultra-long-context reading. This yields a reproducible, comparable, and governable coordinate system for research and deployment.
arXiv.org Artificial Intelligence
Sep-24-2025
- Country:
- Asia
- China (0.04)
- Japan (0.04)
- Middle East
- Jordan (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Myanmar > Tanintharyi Region
- Dawei (0.04)
- Thailand > Bangkok
- Bangkok (0.04)
- Europe
- France (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Ukraine > Kyiv Oblast
- Kyiv (0.04)
- United Kingdom (0.04)
- Asia
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Energy (0.67)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (1.00)
- Law (1.00)
- Technology:
- Information Technology
- Artificial Intelligence
- Cognitive Science (1.00)
- Machine Learning > Neural Networks
- Deep Learning (0.47)
- Natural Language
- Chatbot (0.93)
- Large Language Model (1.00)
- Representation & Reasoning (1.00)
- Data Science > Data Mining (0.92)
- Information Management (1.00)
- Security & Privacy (1.00)
- Artificial Intelligence
- Information Technology