Rote Learning
Unveiling Over-Memorization in Finetuning LLMs for Reasoning Tasks
Ruan, Zhiwen, Chen, Yun, Hou, Yutao, Li, Peng, Liu, Yang, Chen, Guanhua
The pretrained large language models (LLMs) are finetuned with labeled data for better instruction following ability and alignment with human values. In this paper, we study the learning dynamics of LLM finetuning on reasoning tasks and reveal the uncovered over-memorization phenomenon during a specific stage of LLM finetuning. At this stage, the LLMs have excessively memorized training data and exhibit high test perplexity while maintaining good test accuracy. We explore the conditions that contribute to over-memorization and discover that this issue is prevalent across various tasks, models, and fine-tuning methods, with prolonged training and large learning rates exacerbating the problem. Although models with over-memorization demonstrate comparable test accuracy to normal models, they suffer from reduced robustness, poor out-of-distribution generalization, and decreased generation diversity. In light of our findings on over-memorization, we offer recommendations for checkpoint selection and propose techniques such as checkpoint merging and memorization-aware reweighting to mitigate this effect.
Entropy-Memorization Law: Evaluating Memorization Difficulty of Data in LLMs
Huang, Yizhan, Yang, Zhe, Chen, Meifang, Nianchen, Huang, Zhang, Jianping, Lyu, Michael R.
Large Language Models (LLMs) are known to memorize portions of their training data, sometimes reproducing content verbatim when prompted appropriately. In this work, we investigate a fundamental yet under-explored question in the domain of memorization: How to characterize memorization difficulty of training data in LLMs? Through empirical experiments on OLMo, a family of open models, we present the Entropy-Memorization Law. It suggests that data entropy is linearly correlated with memorization score. Moreover, in a case study of memorizing highly randomized strings, or "gibberish", we observe that such sequences, despite their apparent randomness, exhibit unexpectedly low empirical entropy compared to the broader training corpus. Adopting the same strategy to discover Entropy-Memorization Law, we derive a simple yet effective approach to distinguish training and testing data, enabling Dataset Inference (DI).
Memorization or Reasoning? Exploring the Idiom Understanding of LLMs
Kim, Jisu, Shin, Youngwoo, Hwang, Uiji, Choi, Jihun, Xuan, Richeng, Kim, Taeuk
Idioms have long posed a challenge due to their unique linguistic properties, which set them apart from other common expressions. While recent studies have leveraged large language models (LLMs) to handle idioms across various tasks, e.g., idiom-containing sentence generation and idiomatic machine translation, little is known about the underlying mechanisms of idiom processing in LLMs, particularly in multilingual settings. To this end, we introduce MIDAS, a new large-scale dataset of idioms in six languages, each paired with its corresponding meaning. Leveraging this resource, we conduct a comprehensive evaluation of LLMs' idiom processing ability, identifying key factors that influence their performance. Our findings suggest that LLMs rely not only on memorization, but also adopt a hybrid approach that integrates contextual cues and reasoning, especially when processing compositional idioms. This implies that idiom understanding in LLMs emerges from an interplay between internal knowledge retrieval and reasoning-based inference.
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning
Chu, Zhaoyang, Wan, Yao, Zhang, Zhikun, Wang, Di, Yang, Zhou, Zhang, Hongyu, Zhou, Pan, Shi, Xuanhua, Jin, Hai, Lo, David
While Code Language Models (CLMs) have demonstrated superior performance in software engineering tasks such as code generation and summarization, recent empirical studies reveal a critical privacy vulnerability: these models exhibit unintended memorization of sensitive training data, enabling verbatim reproduction of confidential information when specifically prompted. To address this issue, several approaches, including training data de-duplication and differential privacy augmentation, have been proposed. However, these methods require full-model retraining for deployed CLMs, which incurs substantial computational costs. In this paper, we aim to answer the following research question: Can sensitive information memorized by CLMs be erased effectively and efficiently? We conduct a pioneering investigation into erasing sensitive memorization in CLMs through machine unlearning - a post-hoc modification method that removes specific information from trained models without requiring full retraining. Specifically, we first quantify the memorization risks of sensitive data within CLM training datasets and curate a high-risk dataset of 50,000 sensitive memorized samples as unlearning targets. We study two widely used gradient ascent-based unlearning approaches: the vanilla and constraint-based methods, and introduce CodeEraser, an advanced variant that selectively unlearns sensitive memorized segments in code while preserving the structural integrity and functional correctness of the surrounding code. Extensive experiments on three families of CLMs, i.e., CodeParrot, CodeGen-Mono, and Qwen2.5-Coder, validate the effectiveness and efficiency of CodeEraser in erasing targeted sensitive memorization while maintaining model utility.
Memorization Sinks: Isolating Memorization during LLM Training
Ghosal, Gaurav R., Maini, Pratyush, Raghunathan, Aditi
Large language models are susceptible to memorizing repeated sequences, posing privacy and copyright concerns. A popular mitigation strategy is to remove memorized information from specific neurons post-hoc. However, such approaches have shown limited success so far. In a controlled setting, we show that the memorization of natural sequences (those that resemble linguistically plausible text) become mechanistically entangled with general language abilities, thereby becoming challenging to remove post-hoc. In this work, we put forward a new paradigm of MemSinks that promotes isolation of memorization by design. We leverage a sequence identifier that activates a unique set of memorization neurons for each sequence across repetitions. By analyzing the dynamics of learning and forgetting, we argue that MemSinks facilitates isolation of memorized content, making it easier to remove without compromising general language capabilities. We implement MemSinks at the billion-parameter and billion-token scale, and observe both effective isolation and strong generalization. To our knowledge, this is the first proof-of-concept on real data demonstrating that simultaneous generalization and isolation is achievable. We open-source our code at http://github.com/grghosal/MemSinks.
Memorization $\neq$ Understanding: Do Large Language Models Have the Ability of Scenario Cognition?
Ma, Boxiang, Li, Ru, Wang, Yuanlong, Tan, Hongye, Li, Xiaoli
Driven by vast and diverse textual data, large language models (LLMs) have demonstrated impressive performance across numerous natural language processing (NLP) tasks. Yet, a critical question persists: does their generalization arise from mere memorization of training data or from deep semantic understanding? To investigate this, we propose a bi-perspective evaluation framework to assess LLMs' scenario cognition - the ability to link semantic scenario elements with their arguments in context. Specifically, we introduce a novel scenario-based dataset comprising diverse textual descriptions of fictional facts, annotated with scenario elements. LLMs are evaluated through their capacity to answer scenario-related questions (model output perspective) and via probing their internal representations for encoded scenario elements-argument associations (internal representation perspective). Our experiments reveal that current LLMs predominantly rely on superficial memorization, failing to achieve robust semantic scenario cognition, even in simple cases. These findings expose critical limitations in LLMs' semantic understanding and offer cognitive insights for advancing their capabilities.
Localizing and Mitigating Memorization in Image Autoregressive Models
Kasliwal, Aditya, Boenisch, Franziska, Dziedzic, Adam
Image AutoRegressive (IAR) models have achieved state-of-the-art performance in speed and quality of generated images. However, they also raise concerns about memorization of their training data and its implications for privacy. This work explores where and how such memorization occurs within different image autoregressive architectures by measuring a fine-grained memorization. The analysis reveals that memorization patterns differ across various architectures of IARs. In hierarchical per-resolution architectures, it tends to emerge early and deepen with resolutions, while in IARs with standard autoregressive per token prediction, it concentrates in later processing stages. These localization of memorization patterns are further connected to IARs' ability to memorize and leak training data. By intervening on their most memorizing components, we significantly reduce the capacity for data extraction from IARs with minimal impact on the quality of generated images. These findings offer new insights into the internal behavior of image generative models and point toward practical strategies for mitigating privacy risks.
Data Cartography for Detecting Memorization Hotspots and Guiding Data Interventions in Generative Models
Modern generative models risk overfitting and unintentionally memorizing rare training examples, which can be extracted by adversaries or inflate benchmark performance. We propose Generative Data Cartography (GenDataCarto), a data-centric framework that assigns each pretraining sample a difficulty score (early-epoch loss) and a memorization score (frequency of ``forget events''), then partitions examples into four quadrants to guide targeted pruning and up-/down-weighting. We prove that our memorization score lower-bounds classical influence under smoothness assumptions and that down-weighting high-memorization hotspots provably decreases the generalization gap via uniform stability bounds. Empirically, GenDataCarto reduces synthetic canary extraction success by over 40\% at just 10\% data pruning, while increasing validation perplexity by less than 0.5\%. These results demonstrate that principled data interventions can dramatically mitigate leakage with minimal cost to generative performance.
Beyond Frequency: The Role of Redundancy in Large Language Model Memorization
Zhang, Jie, Zhao, Qinghua, Lin, Chi-ho, Kang, Zhongfeng, Li, Lei
Memorization in large language models poses critical risks for privacy and fairness as these systems scale to billions of parameters. While previous studies established correlations between memorization and factors like token frequency and repetition patterns, we revealed distinct response patterns: frequency increases minimally impact memorized samples (e.g. 0.09) while substantially affecting non-memorized samples (e.g., 0.25), with consistency observed across model scales. Through counterfactual analysis by perturbing sample prefixes and quantifying perturbation strength through token positional changes, we demonstrate that redundancy correlates with memorization patterns. Our findings establish that: about 79% of memorized samples are low-redundancy, these low-redundancy samples exhibit 2-fold higher vulnerability than high-redundancy ones, and consequently memorized samples drop by 0.6 under perturbation while non-memorized samples drop by only 0.01, indicating that more redundant content becomes both more memorable and more fragile. These findings suggest potential redundancy-guided approaches for data preprocessing, thereby reducing privacy risks and mitigating bias to ensure fairness in model deployments.
On the Edge of Memorization in Diffusion Models
Buchanan, Sam, Pai, Druv, Ma, Yi, De Bortoli, Valentin
When do diffusion models reproduce their training data, and when are they able to generate samples beyond it? A practically relevant theoretical understanding of this interplay between memorization and generalization may significantly impact real-world deployments of diffusion models with respect to issues such as copyright infringement and data privacy. In this work, to disentangle the different factors that influence memorization and generalization in practical diffusion models, we introduce a scientific and mathematical "laboratory" for investigating these phenomena in diffusion models trained on fully synthetic or natural image-like structured data. Within this setting, we hypothesize that the memorization or generalization behavior of an underparameterized trained model is determined by the difference in training loss between an associated memorizing model and a generalizing model. To probe this hypothesis, we theoretically characterize a crossover point wherein the weighted training loss of a fully generalizing model becomes greater than that of an underparameterized memorizing model at a critical value of model (under)parameterization. We then demonstrate via carefully-designed experiments that the location of this crossover predicts a phase transition in diffusion models trained via gradient descent, validating our hypothesis. Ultimately, our theory enables us to analytically predict the model size at which memorization becomes predominant. Our work provides an analytically tractable and practically meaningful setting for future theoretical and empirical investigations. Code for our experiments is available at https://github.com/DruvPai/diffusion_mem_gen.