AITopics | Rote Learning

Collaborating Authors

Rote Learning

News Overviews Instructional Materials AI-Alerts Classics

Memorization Over Reasoning? Exposing and Mitigating Verbatim Memorization in Large Language Models' Character Understanding Evaluation

arXiv.org Artificial IntelligenceDec-29-2024

Recently, Large Language Models (LLMs) have shown impressive performance in character understanding tasks, such as analyzing the roles, personalities, and relationships of fictional characters. However, the extensive pre-training corpora used by LLMs raise concerns that they may rely on memorizing popular fictional works rather than genuinely understanding and reasoning about them. In this work, we argue that 'gist memory'-capturing essential meaning - should be the primary mechanism for character understanding tasks, as opposed to 'verbatim memory' - exact match of a string. We introduce a simple yet effective method to mitigate mechanized memorization in character understanding evaluations while preserving the essential implicit cues needed for comprehension and reasoning. Our approach reduces memorization-driven performance on popular fictional works from 96% accuracy to 72% and results in up to an 18% drop in accuracy across various character understanding tasks. These findings underscore the issue of data contamination in existing benchmarks, which often measure memorization rather than true character understanding.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2412.14368

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.05)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(8 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Leisure & Entertainment (1.00)
Media > Television (0.48)
Media > Film (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Think or Remember? Detecting and Directing LLMs Towards Memorization or Generalization

Fu, Yi-Fu, Tu, Yu-Chieh, Cheng, Tzu-Ling, Lin, Cheng-Yu, Yang, Yi-Ting, Liu, Heng-Yi, Liao, Keng-Te, Juan, Da-Cheng, Lin, Shou-De

arXiv.org Artificial IntelligenceDec-24-2024

In this paper, we explore the foundational mechanisms of memorization and generalization in Large Language Models (LLMs), inspired by the functional specialization observed in the human brain. Our investigation serves as a case study leveraging specially designed datasets and experimental-scale LLMs to lay the groundwork for understanding these behaviors. Specifically, we aim to first enable LLMs to exhibit both memorization and generalization by training with the designed dataset, then (a) examine whether LLMs exhibit neuron-level spatial differentiation for memorization and generalization, (b) predict these behaviors using model internal representations, and (c) steer the behaviors through inference-time interventions. Our findings reveal that neuron-wise differentiation of memorization and generalization is observable in LLMs, and targeted interventions can successfully direct their behavior.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.18497

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Understanding and Mitigating Memorization in Diffusion Models for Tabular Data

Fang, Zhengyu, Jiang, Zhimeng, Chen, Huiyuan, Li, Xiao, Li, Jing

arXiv.org Artificial IntelligenceDec-14-2024

Tabular data generation has attracted significant research interest in recent years, with the tabular diffusion models greatly improving the quality of synthetic data. However, while memorization, where models inadvertently replicate exact or near-identical training data, has been thoroughly investigated in image and text generation, its effects on tabular data remain largely unexplored. In this paper, we conduct the first comprehensive investigation of memorization phenomena in diffusion models for tabular data. Our empirical analysis reveals that memorization appears in tabular diffusion models and increases with larger training epochs. We further examine the influence of factors such as dataset sizes, feature dimensions, and different diffusion models on memorization. Additionally, we provide a theoretical explanation for why memorization occurs in tabular diffusion models. To address this issue, we propose TabCutMix, a simple yet effective data augmentation technique that exchanges randomly selected feature segments between random same-class training sample pairs. Building upon this, we introduce TabCutMixPlus, an enhanced method that clusters features based on feature correlations and ensures that features within the same cluster are exchanged together during augmentation. This clustering mechanism mitigates out-of-distribution (OOD) generation issues by maintaining feature coherence. Experimental results across various datasets and diffusion models demonstrate that TabCutMix effectively mitigates memorization while maintaining high-quality data generation.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.11044

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
North America > United States > Ohio > Cuyahoga County > Cleveland (0.04)
Europe > Germany (0.04)
Asia > Taiwan (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Banking & Finance > Credit (0.67)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

When Can Memorization Improve Fairness?

Pepin, Bob, Igel, Christian, Selvan, Raghavendra

arXiv.org Artificial IntelligenceDec-12-2024

We study to which extent additive fairness metrics (statistical parity, equal opportunity and equalized odds) can be influenced in a multi-class classification problem by memorizing a subset of the population. We give explicit expressions for the bias resulting from memorization in terms of the label and group membership distribution of the memorized dataset and the classifier bias on the unmemorized dataset. We also characterize the memorized datasets that eliminate the bias for all three metrics considered. Finally we provide upper and lower bounds on the total probability mass in the memorized dataset that is necessary for the complete elimination of these biases.

artificial intelligence, classifier, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.09254

Country: Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.64)

Add feedback

Differential learning kinetics govern the transition from memorization to generalization during in-context learning

Nguyen, Alex, Reddy, Gautam

arXiv.org Artificial IntelligenceDec-12-2024

Transformers exhibit in-context learning (ICL): the ability to use novel information presented in the context without additional weight updates. Recent work shows that ICL emerges when models are trained on a sufficiently diverse set of tasks and the transition from memorization to generalization is sharp with increasing task diversity. One interpretation is that a network's limited capacity to memorize favors generalization. Here, we examine the mechanistic underpinnings of this transition using a small transformer applied to a synthetic ICL task. Using theory and experiment, we show that the sub-circuits that memorize and generalize can be viewed as largely independent. The relative rates at which these sub-circuits learn explains the transition from memorization to generalization, rather than capacity constraints. We uncover a memorization scaling law, which determines the task diversity threshold at which the network generalizes. The theory quantitatively explains a variety of other ICL-related phenomena, including the long-tailed distribution of when ICL is acquired, the bimodal behavior of solutions close to the task diversity threshold, the influence of contextual and data distributional statistics on ICL, and the transient nature of ICL.

icl, sequence, transition, (14 more...)

arXiv.org Artificial Intelligence

2412.00104

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Localizing Memorization in SSL Vision Encoders

Wang, Wenhao, Dziedzic, Adam, Backes, Michael, Boenisch, Franziska

arXiv.org Artificial IntelligenceDec-12-2024

Recent work on studying memorization in self-supervised learning (SSL) suggests that even though SSL encoders are trained on millions of images, they still memorize individual data points. While effort has been put into characterizing the memorized data and linking encoder memorization to downstream utility, little is known about where the memorization happens inside SSL encoders. To close this gap, we propose two metrics for localizing memorization in SSL encoders on a per-layer (LayerMem) and per-unit basis (UnitMem). Our localization methods are independent of the downstream task, do not require any label information, and can be performed in a forward pass. By localizing memorization in various encoder architectures (convolutional and transformer-based) trained on diverse datasets with contrastive and non-contrastive SSL frameworks, we find that (1) while SSL memorization increases with layer depth, highly memorizing units are distributed across the entire encoder, (2) a significant fraction of units in SSL encoders experiences surprisingly high memorization of individual data points, which is in contrast to models trained under supervision, (3) atypical (or outlier) data points cause much higher layer and unit memorization than standard data points, and (4) in vision transformers, most memorization happens in the fully-connected layers. Finally, we show that localizing memorization in SSL has the potential to improve fine-tuning and to inform pruning strategies.

encoder, memorization, unitmem, (15 more...)

arXiv.org Artificial Intelligence

2409.19069

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > Dominican Republic (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

MemHunter: Automated and Verifiable Memorization Detection at Dataset-scale in LLMs

Wu, Zhenpeng, Lou, Jian, Zheng, Zibin, Chen, Chuan

arXiv.org Artificial IntelligenceDec-10-2024

Large language models (LLMs) have been shown to memorize and reproduce content from their training data, raising significant privacy concerns, especially with web-scale datasets. Existing methods for detecting memorization are largely sample-specific, relying on manually crafted or discretely optimized memory-inducing prompts generated on a per-sample basis, which become impractical for dataset-level detection due to the prohibitive computational cost of iterating over all samples. In real-world scenarios, data owners may need to verify whether a susceptible LLM has memorized their dataset, particularly if the LLM may have collected the data from the web without authorization. To address this, we introduce \textit{MemHunter}, which trains a memory-inducing LLM and employs hypothesis testing to efficiently detect memorization at the dataset level, without requiring sample-specific memory inducing. Experiments on models such as Pythia and Llama-2 demonstrate that \textit{MemHunter} can extract up to 40\% more training data than existing methods under constrained time resources and reduce search time by up to 80\% when integrated as a plug-in. Crucially, \textit{MemHunter} is the first method capable of dataset-level memorization detection, providing an indispensable tool for assessing privacy risks in LLMs that are powered by vast web-sourced datasets.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2412.07261

Country:

Asia > China > Guangdong Province > Zhuhai (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

DeMem: Privacy-Enhanced Robust Adversarial Learning via De-Memorization

Luo, Xiaoyu, Li, Qiongxiu

arXiv.org Artificial IntelligenceDec-10-2024

Adversarial robustness, the ability of a model to withstand manipulated inputs that cause errors, is essential for ensuring the trustworthiness of machine learning models in real-world applications. However, previous studies have shown that enhancing adversarial robustness through adversarial training increases vulnerability to privacy attacks. While differential privacy can mitigate these attacks, it often compromises robustness against both natural and adversarial samples. Our analysis reveals that differential privacy disproportionately impacts low-risk samples, causing an unintended performance drop. To address this, we propose DeMem, which selectively targets high-risk samples, achieving a better balance between privacy protection and model robustness. DeMem is versatile and can be seamlessly integrated into various adversarial training techniques. Extensive evaluations across multiple training methods and datasets demonstrate that DeMem significantly reduces privacy leakage while maintaining robustness against both natural and adversarial samples. These results confirm DeMem's effectiveness and broad applicability in enhancing privacy without compromising robustness.

memorization score, privacy leakage, robustness, (10 more...)

arXiv.org Artificial Intelligence

2412.05767

Country:

Europe > Italy (0.04)
Europe > Denmark > North Jutland > Aalborg (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

The Pitfalls of Memorization: When Memorization Hurts Generalization

Bayat, Reza, Pezeshki, Mohammad, Dohmatob, Elvis, Lopez-Paz, David, Vincent, Pascal

arXiv.org Machine LearningDec-10-2024

Neural networks often learn simple explanations that fit the majority of the data while memorizing exceptions that deviate from these explanations.This behavior leads to poor generalization when the learned explanations rely on spurious correlations. In this work, we formalize the interplay between memorization and generalization, showing that spurious correlations would particularly lead to poor generalization when are combined with memorization. Memorization can reduce training loss to zero, leaving no incentive to learn robust, generalizable patterns. To address this, we propose memorization-aware training (MAT), which uses held-out predictions as a signal of memorization to shift a model's logits. MAT encourages learning robust patterns invariant across distributions, improving generalization under distribution shifts.

example-specific feature, generalization, memorization, (11 more...)

arXiv.org Machine Learning

2412.07684

Country:

North America > United States > California (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Exploring Memorization and Copyright Violation in Frontier LLMs: A Study of the New York Times v. OpenAI 2023 Lawsuit

Freeman, Joshua, Rippe, Chloe, Debenedetti, Edoardo, Andriushchenko, Maksym

arXiv.org Artificial IntelligenceDec-9-2024

Our work aims to measure the propensity of OpenAI's LLMs to exhibit verbatim memorization in its outputs relative to other LLMs, specifically focusing on news articles. We discover that both GPT and Claude models use refusal training and output filters to prevent verbatim output of the memorized articles. We apply a basic prompt template to bypass the refusal training and show that OpenAI models are currently less prone to memorization elicitation than models from Meta, Mistral, and Anthropic. We find that as models increase in size, especially beyond 100 billion parameters, they demonstrate significantly greater capacity for memorization. Our findings have practical implications for training: more attention must be placed on preventing verbatim memorization in very large models.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.0637

Country:

Europe > France (0.14)
North America > United States > Texas (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(4 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Law > Litigation (1.00)
Law > Intellectual Property & Technology Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.94)

Add feedback