Memory-Based Learning
Reason to Rote: Rethinking Memorization in Reasoning
Du, Yupei, Mondorf, Philipp, Casola, Silvia, Yao, Yuekun, Litschko, Robert, Plank, Barbara
Large language models readily memorize arbitrary training instances, such as label noise, yet they perform strikingly well on reasoning tasks. In this work, we investigate how language models memorize label noise, and why such memorization in many cases does not heavily affect generalizable reasoning capabilities. Using two controllable synthetic reasoning datasets with noisy labels, four-digit addition (FDA) and two-hop relational reasoning (THR), we discover a reliance of memorization on generalizable reasoning mechanisms: models continue to compute intermediate reasoning outputs even when retrieving memorized noisy labels, and intervening reasoning adversely affects memorization. We further show that memorization operates through distributed encoding, i.e., aggregating various inputs and intermediate results, rather than building a look-up mechanism from inputs to noisy labels. Moreover, our FDA case study reveals memorization occurs via outlier heuristics, where existing neuron activation patterns are slightly shifted to fit noisy labels. Together, our findings suggest that memorization of label noise in language models builds on, rather than overrides, the underlying reasoning mechanisms, shedding lights on the intriguing phenomenon of benign memorization.
- Europe > Austria > Vienna (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes the Latent Case Model (LCM), a Bayesian approach to clustering in which clusters are represented by a prototype (a specific sample from the data) and feature subspaces (a binary subset of the variables signifying those features that are relevant to the class). The approach is presented as being a Bayesian, trainable version of the Case-Based Reasoning approach popular in AI, and is motivated by the ways such models have proved highly effective in explaining human decision making. The generative model (Figure 1) represents each item as coming from a mixture of S clusters, where each cluster is represented by a prototype and subspace (as above) and a function \phi which generates features matching those of the prototype with high probability for features in the subspace, and uniform features outside it. The model is thus similar in functionality to LDA but quite different in terms of its representation.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.75)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.60)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.85)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.85)
Object-Centric Case-Based Reasoning via Argumentation
Gaul, Gabriel de Olim, Gould, Adam, Kori, Avinash, Toni, Francesca
We introduce Slot Attention Argumentation for Case-Based Reasoning (SAA-CBR), a novel neuro-symbolic pipeline for image classification that integrates object-centric learning via a neural Slot Attention (SA) component with symbolic reasoning conducted by Abstract Argumentation for Case-Based Reasoning (AA-CBR). We explore novel integrations of AA-CBR with the neural component, including feature combination strategies, casebase reduction via representative samples, novel count-based partial orders, a One-Vs-Rest strategy for extending AA-CBR to multi-class classification, and an application of Supported AA-CBR, a bipolar variant of AA-CBR. We demonstrate that SAA-CBR is an effective classifier on the CLEVR-Hans datasets, showing competitive performance against baseline models.
- Europe > United Kingdom > England > Greater London > London (0.14)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Italy (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification
We present the Bayesian Case Model (BCM), a general framework for Bayesian case-based reasoning (CBR) and prototype classification and clustering. BCM brings the intuitive power of CBR to a Bayesian generative framework. The BCM learns prototypes, the ``quintessential observations that best represent clusters in a dataset, by performing joint inference on cluster labels, prototypes and important features. Simultaneously, BCM pursues sparsity by learning subspaces, the sets of features that play important roles in the characterization of the prototypes. The prototype and subspace representation provides quantitative benefits in interpretability while preserving classification accuracy. Human subject experiments verify statistically significant improvements to participants' understanding when using explanations produced by BCM, compared to those given by prior art.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.70)
Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs
Huang, Yizhan, Yang, Zhe, Chen, Meifang, Nianchen, Huang, Zhang, Jianping, Lyu, Michael R.
Large Language Models (LLMs) are known to memorize portions of their training data, sometimes reproducing content verbatim when prompted appropriately. In this work, we investigate a fundamental yet under-explored question in the domain of memorization: How to characterize memorization difficulty of training data in LLMs? Through empirical experiments on OLMo, a family of open models, we present the Entropy-Memorization Law. It suggests that data entropy is linearly correlated with memorization score. Moreover, in a case study of memorizing highly randomized strings, or "gibberish", we observe that such sequences, despite their apparent randomness, exhibit unexpectedly low empirical entropy compared to the broader training corpus. Adopting the same strategy to discover Entropy-Memorization Law, we derive a simple yet effective approach to distinguish training and testing data, enabling Dataset Inference (DI).
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > California > Orange County > Anaheim (0.04)
- (7 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning
Chu, Zhaoyang, Wan, Yao, Zhang, Zhikun, Wang, Di, Yang, Zhou, Zhang, Hongyu, Zhou, Pan, Shi, Xuanhua, Jin, Hai, Lo, David
While Code Language Models (CLMs) have demonstrated superior performance in software engineering tasks such as code generation and summarization, recent empirical studies reveal a critical privacy vulnerability: these models exhibit unintended memorization of sensitive training data, enabling verbatim reproduction of confidential information when specifically prompted. To address this issue, several approaches, including training data de-duplication and differential privacy augmentation, have been proposed. However, these methods require full-model retraining for deployed CLMs, which incurs substantial computational costs. In this paper, we aim to answer the following research question: Can sensitive information memorized by CLMs be erased effectively and efficiently? We conduct a pioneering investigation into erasing sensitive memorization in CLMs through machine unlearning - a post-hoc modification method that removes specific information from trained models without requiring full retraining. Specifically, we first quantify the memorization risks of sensitive data within CLM training datasets and curate a high-risk dataset of 50,000 sensitive memorized samples as unlearning targets. We study two widely used gradient ascent-based unlearning approaches: the vanilla and constraint-based methods, and introduce CodeEraser, an advanced variant that selectively unlearns sensitive memorized segments in code while preserving the structural integrity and functional correctness of the surrounding code. Extensive experiments on three families of CLMs, i.e., CodeParrot, CodeGen-Mono, and Qwen2.5-Coder, validate the effectiveness and efficiency of CodeEraser in erasing targeted sensitive memorization while maintaining model utility.
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- Asia > China > Hubei Province > Wuhan (0.04)
- (10 more...)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Memorization Sinks: Isolating Memorization during LLM Training
Ghosal, Gaurav R., Maini, Pratyush, Raghunathan, Aditi
Large language models are susceptible to memorizing repeated sequences, posing privacy and copyright concerns. A popular mitigation strategy is to remove memorized information from specific neurons post-hoc. However, such approaches have shown limited success so far. In a controlled setting, we show that the memorization of natural sequences (those that resemble linguistically plausible text) become mechanistically entangled with general language abilities, thereby becoming challenging to remove post-hoc. In this work, we put forward a new paradigm of MemSinks that promotes isolation of memorization by design. We leverage a sequence identifier that activates a unique set of memorization neurons for each sequence across repetitions. By analyzing the dynamics of learning and forgetting, we argue that MemSinks facilitates isolation of memorized content, making it easier to remove without compromising general language capabilities. We implement MemSinks at the billion-parameter and billion-token scale, and observe both effective isolation and strong generalization. To our knowledge, this is the first proof-of-concept on real data demonstrating that simultaneous generalization and isolation is achievable. We open-source our code at http://github.com/grghosal/MemSinks.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Dominican Republic (0.04)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
V-Math: An Agentic Approach to the Vietnamese National High School Graduation Mathematics Exams
Nguyen, Duong Q., Nguyen, Quy P., Van Nhon, Nguyen, Bui, Quang-Thinh, Nguyen-Xuan, H.
This paper develops an autonomous agentic framework called V-Math that aims to assist Vietnamese high school students in preparing for the National High School Graduation Mathematics Exams (NHSGMEs). The salient framework integrates three specialized AI agents: a specification-matrix-conditioned question generator, a solver/explainer for detailed step-by-step reasoning, and a personalized tutor that adapts to student performance. Beyond enabling self-paced student practice, V-Math supports teachers by generating innovative, compliant exam questions and building diverse, high-quality question banks. This reduces manual workload and enriches instructional resources. We describe the system architecture, focusing on practice modes for learners and teacher-oriented features for question generation. Preliminary evaluations demonstrate that V-Math produces matrix-aligned exams with high solution accuracy, delivers coherent explanations, and enhances the variety of practice materials. These results highlight its potential to support scalable, equitable mathematics preparation aligned with national standards while also empowering teachers through AI-assisted exam creation.
- Europe (0.14)
- Asia > Vietnam > Hải Dương Province > Hải Dương (0.04)
- Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (3 more...)
Case-Based Decision-Theoretic Decoding with Quality Memories
Deguchi, Hiroyuki, Nagata, Masaaki
Minimum Bayes risk (MBR) decoding is a decision rule of text generation, which selects the hypothesis that maximizes the expected utility and robustly generates higher-quality texts than maximum a posteriori (MAP) decoding. However, it depends on sample texts drawn from the text generation model; thus, it is difficult to find a hypothesis that correctly captures the knowledge or information of out-of-domain. To tackle this issue, we propose case-based decision-theoretic (CBDT) decoding, another method to estimate the expected utility using examples of domain data. CBDT decoding not only generates higher-quality texts than MAP decoding, but also the combination of MBR and CBDT decoding outperformed MBR decoding in seven domain De--En and Ja$\leftrightarrow$En translation tasks and image captioning tasks on MSCOCO and nocaps datasets.
- Europe > Austria > Vienna (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- (24 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.78)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.71)