AITopics | Memory-Based Learning

Collaborating Authors

Memory-Based Learning

[Sometimes called Case-Based Reasoning or CBR]
"At the highest level of generality, a general CBR cycle may be described by the following four processes: 1. RETRIEVE the most similar case or cases. 2. REUSE the information and knowledge in that case to solve the problem. 3. REVISE the proposed solution. 4. RETAIN the parts of this experience likely to be useful for future problem solving "– from Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches. By A. Aamodt and E. Plaza. (1994)

News Overviews Instructional Materials AI-Alerts Classics

Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

Neural Information Processing SystemsNov-13-2025, 22:18:01 GMT

A framework to understand this phenomenon comes from the Neural Tangent Kernel (NTK).

artificial intelligence, machine learning, neural network, (15 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland (0.04)
Europe > Austria (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.42)

Add feedback

TNT: Improving Chunkwise Training for Test-Time Memorization

Li, Zeman, Behrouz, Ali, Deng, Yuan, Zhong, Peilin, Kacham, Praneeth, Karami, Mahdi, Razaviyayn, Meisam, Mirrokni, Vahab

arXiv.org Artificial IntelligenceNov-11-2025

Recurrent neural networks (RNNs) with deep test-time memorization modules, such as Titans and TTT, represent a promising, linearly-scaling paradigm distinct from Transformers. While these expressive models do not yet match the peak performance of state-of-the-art Transformers, their potential has been largely untapped due to prohibitively slow training and low hardware utilization. Existing parallelization methods force a fundamental conflict governed by the chunksize hyperparameter: large chunks boost speed but degrade performance, necessitating a fixed, suboptimal compromise. To solve this challenge, we introduce TNT, a novel training paradigm that decouples training efficiency from inference performance through a two-stage process. Stage one is an efficiency-focused pre-training phase utilizing a hierarchical memory. A global module processes large, hardware-friendly chunks for long-range context, while multiple parallel local modules handle fine-grained details. Crucially, by periodically resetting local memory states, we break sequential dependencies to enable massive context parallelization. Stage two is a brief fine-tuning phase where only the local memory modules are adapted to a smaller, high-resolution chunksize, maximizing accuracy with minimal overhead. Evaluated on Titans and TTT models, TNT achieves a substantial acceleration in training speed-up to 17 times faster than the most accurate baseline configuration - while simultaneously improving model accuracy. This improvement removes a critical scalability barrier, establishing a practical foundation for developing expressive RNNs and facilitating future work to close the performance gap with Transformers.

artificial intelligence, machine learning, memory module, (15 more...)

arXiv.org Artificial Intelligence

2511.07343

Country:

North America > United States > California (0.14)
Asia > Middle East > Jordan (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.61)

Add feedback

Provable Separations between Memorization and Generalization in Diffusion Models

Ye, Zeqi, Zhu, Qijie, Tao, Molei, Chen, Minshuo

arXiv.org Machine LearningNov-10-2025

Diffusion models have achieved remarkable success across diverse domains, but they remain vulnerable to memorization -- reproducing training data rather than generating novel outputs. This not only limits their creative potential but also raises concerns about privacy and safety. While empirical studies have explored mitigation strategies, theoretical understanding of memorization remains limited. We address this gap through developing a dual-separation result via two complementary perspectives: statistical estimation and network approximation. From the estimation side, we show that the ground-truth score function does not minimize the empirical denoising loss, creating a separation that drives memorization. From the approximation side, we prove that implementing the empirical score function requires network size to scale with sample size, spelling a separation compared to the more compact network representation of the ground-truth score function. Guided by these insights, we develop a pruning-based method that reduces memorization while maintaining generation quality in diffusion transformers.

artificial intelligence, machine learning, score function, (16 more...)

arXiv.org Machine Learning

2511.03202

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (0.92)

Industry: Information Technology > Security & Privacy (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

REMIND: Input Loss Landscapes Reveal Residual Memorization in Post-Unlearning LLMs

Cohen, Liran, Nemcovesky, Yaniv, Mendelson, Avi

arXiv.org Artificial IntelligenceNov-7-2025

Machine unlearning aims to remove the influence of specific training data from a model without requiring full retraining. This capability is crucial for ensuring privacy, safety, and regulatory compliance. Therefore, verifying whether a model has truly forgotten target data is essential for maintaining reliability and trustworthiness. However, existing evaluation methods often assess forgetting at the level of individual inputs. This approach may overlook residual influence present in semantically similar examples. Such influence can compromise privacy and lead to indirect information leakage. We propose REMIND (Residual Memorization In Neighborhood Dynamics), a novel evaluation method aiming to detect the subtle remaining influence of unlearned data and classify whether the data has been effectively forgotten. REMIND analyzes the model's loss over small input variations and reveals patterns unnoticed by single-point evaluations. We show that unlearned data yield flatter, less steep loss landscapes, while retained or unrelated data exhibit sharper, more volatile patterns. REMIND requires only query-based access, outperforms existing methods under similar constraints, and demonstrates robustness across different models, datasets, and paraphrased inputs, making it practical for real-world deployment. By providing a more sensitive and interpretable measure of unlearning effectiveness, REMIND provides a reliable framework to assess unlearning in language models. As a result, REMIND offers a novel perspective on memorization and unlearning.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.04228

Country:

North America > United States > Virginia (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Analyzing the Power of Chain of Thought through Memorization Capabilities

Yu, Lijia, Gao, Xiao-Shan, Zhang, Lijun

arXiv.org Machine LearningNov-4-2025

It has been shown that the chain of thought (CoT) can enhance the power of large language models (LLMs) to solve certain mathematical reasoning problems. However, the capacity of CoT is still not fully explored. As an important instance, the following basic question has not yet been answered: Does CoT expand the capability of transformers across all reasoning tasks? We demonstrate that reasoning with transformers is essentially a memorization problem for reasoning datasets. Thus, examining the power of CoT across all reasoning tasks amounts to analyzing the memorization capabilities of CoT transformers. In this paper, we give a complete description of the memorization capabilities of fixed-precision transformers with or without CoT and give a negative answer to the above-mentioned question. Precisely, we first give necessary and sufficient conditions for fixed-precision transformers with and without CoT to memorize a finite reasoning dataset and show that these two conditions do not imply each other. Then, we give lower and upper bounds for the number of parameters needed for transformers with or without CoT to memorize a finite reasoning dataset with $N$ elements, which are $\overlineΘ(N)$ in all cases. This implies that there exist reasoning tasks for which CoT does not enhance the reasoning power of transformers, leading to a negative answer to the above-mentioned question. Finally, we give the first results on memorizing infinite reasoning datasets by CoT transformers and show that some simple infinite datasets cannot be memorized by transformers with or without CoT.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2511.0119

Country:

North America > United States (0.14)
Europe > United Kingdom (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.86)

Add feedback

Memorization: A Close Look at Books

Ma, Iris, Domingo, Ian, Krone-Martins, Alberto, Baldi, Pierre, Lopes, Cristina V.

arXiv.org Artificial IntelligenceOct-31-2025

To what extent can entire books be extracted from LLMs? Using the Llama 3 70B family of models, and the "prefix-prompting" extraction technique, we were able to auto-regressively reconstruct, with a very high level of similarity, one entire book (Alice's Adventures in Wonderland) from just the first 500 tokens. We were also able to obtain high extraction rates on several other books, piece-wise. However, these successes do not extend uniformly to all books. We show that extraction rates of books correlate with book popularity and thus, likely duplication in the training data. We also confirm the undoing of mitigations in the instruction-tuned Llama 3.1, following recent work (Nasr et al., 2025). We further find that this undoing comes from changes to only a tiny fraction of weights concentrated primarily in the lower transformer blocks. Our results provide evidence of the limits of current regurgitation mitigation strategies and introduce a framework for studying how fine-tuning affects the retrieval of verbatim memorization in aligned LLMs.

large language model, llama 3, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2504.12549

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Gulf of Mexico > Central GOM (0.04)
Asia > Singapore (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.65)

Add feedback

The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets

Kim, Yujun, Moon, Chaewon, Yun, Chulhee

arXiv.org Artificial IntelligenceOct-29-2025

We study the parameter complexity of robust memorization for $\mathrm{ReLU}$ networks: the number of parameters required to interpolate any given dataset with $ε$-separation between differently labeled points, while ensuring predictions remain consistent within a $μ$-ball around each training sample. We establish upper and lower bounds on the parameter count as a function of the robustness ratio $ρ= μ/ ε$. Unlike prior work, we provide a fine-grained analysis across the entire range $ρ\in (0,1)$ and obtain tighter upper and lower bounds that improve upon existing results. Our findings reveal that the parameter complexity of robust memorization matches that of non-robust memorization when $ρ$ is small, but grows with increasing $ρ$.

artificial intelligence, machine learning, proj, (16 more...)

arXiv.org Artificial Intelligence

2510.24643

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Massive Memorization with Hundreds of Trillions of Parameters for Sequential Transducer Generative Recommenders

Chen, Zhimin, Zhao, Chenyu, Mo, Ka Chun, Jiang, Yunjiang, Lee, Jane H., Chen, Shouwei, Mahajan, Khushhall Chandra, Jiang, Ning, Ren, Kai, Li, Jinhui, Yang, Wen-Yun

arXiv.org Artificial IntelligenceOct-28-2025

Modern large-scale recommendation systems rely heavily on user interaction history sequences to enhance the model performance. The advent of large language models and sequential modeling techniques, particularly transformer-like architectures, has led to significant advancements recently (e.g., HSTU, SIM, and TWIN models). While scaling to ultra-long user histories (10k to 100k items) generally improves model performance, it also creates significant challenges on latency, queries per second (QPS) and GPU cost in industry-scale recommendation systems. Existing models do not adequately address these industrial scalability issues. In this paper, we propose a novel two-stage modeling framework, namely VIrtual Sequential Target Attention (VISTA), which decomposes traditional target attention from a candidate item to user history items into two distinct stages: (1) user history summarization into a few hundred tokens; followed by (2) candidate item attention to those tokens. These summarization token embeddings are then cached in storage system and then utilized as sequence features for downstream model training and inference. This novel design for scalability enables VISTA to scale to lifelong user histories (up to one million items) while keeping downstream training and inference costs fixed, which is essential in industry. Our approach achieves significant improvements in offline and online metrics and has been successfully deployed on an industry leading recommendation platform serving billions of users.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.22049

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > China > Hong Kong (0.04)
Oceania > Australia > Queensland (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Generalization or Memorization: Dynamic Decoding for Mode Steering

Zhang, Xuanming

arXiv.org Artificial IntelligenceOct-28-2025

Large Language Models (LLMs) exhibit a troubling duality, capable of both remarkable generalization and brittle, verbatim memorization of their training data. This unpredictability undermines their reliability in high-stakes applications. In this work, we propose a unified framework to understand, identify, and control these distinct reasoning modes. First, we introduce a theoretical model based on the Information Bottleneck (IB) principle, formalizing generalization as the learning of a compressed, task-relevant representation and memorization as a failure to compress. Building on this theory, we develop Dynamic Mode Steering (DMS), a novel inference-time algorithm which comprises two components: (1) a lightweight, causally-grounded linear probe that identifies the model's instantaneous reliance on memorization, and (2) a dynamic activation steering mechanism that nudges the model's computation towards pre-identified generalization circuits. We frame DMS as a form of adaptive, self-contrastive decoding. Experiments on reasoning and faithfulness tasks demonstrate that DMS significantly improves logical consistency and factual accuracy, thereby offering a principled approach to enhancing LLM reliability.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.22099

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)

Add feedback

This EEG Looks Like These EEGs: Interpretable Interictal Epileptiform Discharge Detection With ProtoEEG-kNN

Tang, Dennis, Donnelly, Jon, Barnett, Alina Jade, Semenova, Lesia, Jing, Jin, Hadar, Peter, Karakis, Ioannis, Selioutski, Olga, Zhao, Kehan, Westover, M. Brandon, Rudin, Cynthia

arXiv.org Artificial IntelligenceOct-27-2025

The presence of interictal epileptiform discharges (IEDs) in electroencephalogram (EEG) recordings is a critical biomarker of epilepsy. Even trained neurologists find detecting IEDs difficult, leading many practitioners to turn to machine learning for help. While existing machine learning algorithms can achieve strong accuracy on this task, most models are uninterpretable and cannot justify their conclusions. Absent the ability to understand model reasoning, doctors cannot leverage their expertise to identify incorrect model predictions and intervene accordingly. To improve the human-model interaction, we introduce ProtoEEG-kNN, an inherently interpretable model that follows a simple case-based reasoning process. ProtoEEG-kNN reasons by comparing an EEG to similar EEGs from the training set and visually demonstrates its reasoning both in terms of IED morphology (shape) and spatial distribution (location). We show that ProtoEEG-kNN can achieve state-of-the-art accuracy in IED detection while providing explanations that experts prefer over existing approaches.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2510.20846

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.04)
North America > United States > Massachusetts (0.04)
Europe > Greece (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.54)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback