AITopics | Memory-Based Learning

Collaborating Authors

Memory-Based Learning

[Sometimes called Case-Based Reasoning or CBR]
"At the highest level of generality, a general CBR cycle may be described by the following four processes: 1. RETRIEVE the most similar case or cases. 2. REUSE the information and knowledge in that case to solve the problem. 3. REVISE the proposed solution. 4. RETAIN the parts of this experience likely to be useful for future problem solving "– from Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches. By A. Aamodt and E. Plaza. (1994)

News Overviews Instructional Materials AI-Alerts Classics

Exploring Memorization and Copyright Violation in Frontier LLMs: A Study of the New York Times v. OpenAI 2023 Lawsuit

Freeman, Joshua, Rippe, Chloe, Debenedetti, Edoardo, Andriushchenko, Maksym

arXiv.org Artificial IntelligenceDec-9-2024

Our work aims to measure the propensity of OpenAI's LLMs to exhibit verbatim memorization in its outputs relative to other LLMs, specifically focusing on news articles. We discover that both GPT and Claude models use refusal training and output filters to prevent verbatim output of the memorized articles. We apply a basic prompt template to bypass the refusal training and show that OpenAI models are currently less prone to memorization elicitation than models from Meta, Mistral, and Anthropic. We find that as models increase in size, especially beyond 100 billion parameters, they demonstrate significantly greater capacity for memorization. Our findings have practical implications for training: more attention must be placed on preventing verbatim memorization in very large models.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.0637

Country:

Europe > France (0.14)
North America > United States > Texas (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(4 more...)

Genre: Research Report > New Finding (0.88)

Industry:

Law > Litigation (1.00)
Law > Intellectual Property & Technology Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.94)

Add feedback

Understanding Memorization in Generative Models via Sharpness in Probability Landscapes

Jeon, Dongjae, Kim, Dueun, No, Albert

arXiv.org Artificial IntelligenceDec-5-2024

In this paper, we introduce a geometric framework to analyze memorization in diffusion models using the eigenvalues of the Hessian of the log probability density. We propose that memorization arises from isolated points in the learned probability distribution, characterized by sharpness in the probability landscape, as indicated by large negative eigenvalues of the Hessian. Through experiments on various datasets, we demonstrate that these eigenvalues effectively detect and quantify memorization. Our approach provides a clear understanding of memorization in diffusion models and lays the groundwork for developing strategies to ensure secure and reliable generative models

diffusion model, eigenvalue, memorization, (13 more...)

arXiv.org Artificial Intelligence

2412.0414

Country:

Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Detecting Memorization in Large Language Models

Slonski, Eduardo

arXiv.org Artificial IntelligenceDec-1-2024

Large language models (LLMs) have achieved impressive results in natural language processing but are prone to memorizing portions of their training data, which can compromise evaluation metrics, raise privacy concerns, and limit generalization. Traditional methods for detecting memorization rely on output probabilities or loss functions, often lacking precision due to confounding factors like common language patterns. In this paper, we introduce an analytical method that precisely detects memorization by examining neuron activations within the LLM. By identifying specific activation patterns that differentiate between memorized and not memorized tokens, we train classification probes that achieve near-perfect accuracy. The approach can also be applied to other mechanisms, such as repetition, as demonstrated in this study, highlighting its versatility. Intervening on these activations allows us to suppress memorization without degrading overall performance, enhancing evaluation integrity by ensuring metrics reflect genuine generalization. Additionally, our method supports large-scale labeling of tokens and sequences, crucial for next-generation AI models, improving training efficiency and results. Our findings contribute to model interpretability and offer practical tools for analyzing and controlling internal mechanisms in LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.01014

Country: Europe (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

From memorization to generalization: a theoretical framework for diffusion-based generative models

Halder, Indranil

arXiv.org Artificial IntelligenceNov-26-2024

Diffusion-based generative models demonstrate a transition from memorizing the training dataset to a non-memorization regime as the size of the training set increases. Here, we begin by introducing a mathematically precise definition of this transition in terms of a relative distance: the model is said to be in the non-memorization/`generalization' regime if the generated distribution is almost surely far from the probability distribution associated with a Gaussian kernel approximation to the training dataset, relative to the sampling distribution. Then, we develop an analytically tractable diffusion model and establish a lower bound on Kullback-Leibler divergence between the generated and sampling distribution. The model also features the transition, according to our definition in terms of the relative distance, when the training data is sampled from an isotropic Gaussian distribution. Further, our study reveals that this transition occurs when the individual distance between the generated and underlying sampling distribution begins to decrease with the addition of more training samples. This is to be contrasted with an alternative scenario, where the model's memorization performance degrades, but generalization performance doesn't improve. We also provide empirical evidence indicating that realistic diffusion models exhibit the same alignment of scales.

diffusion-based generative model, machine learning, natural language, (4 more...)

arXiv.org Artificial Intelligence

2411.17807

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.60)

Add feedback

Classifier-Free Guidance inside the Attraction Basin May Cause Memorization

Jain, Anubhav, Kobayashi, Yuya, Shibuya, Takashi, Takida, Yuhta, Memon, Nasir, Togelius, Julian, Mitsufuji, Yuki

arXiv.org Artificial IntelligenceNov-23-2024

Diffusion models are prone to exactly reproduce images from the training data. This exact reproduction of the training data is concerning as it can lead to copyright infringement and/or leakage of privacy-sensitive information. In this paper, we present a novel way to understand the memorization phenomenon, and propose a simple yet effective approach to mitigate it. We argue that memorization occurs because of an attraction basin in the denoising process which steers the diffusion trajectory towards a memorized image. However, this can be mitigated by guiding the diffusion trajectory away from the attraction basin by not applying classifier-free guidance until an ideal transition point occurs from which classifier-free guidance is applied. This leads to the generation of non-memorized images that are high in image quality and well-aligned with the conditioning mechanism. To further improve on this, we present a new guidance technique, \emph{opposite guidance}, that escapes the attraction basin sooner in the denoising process. We demonstrate the existence of attraction basins in various scenarios in which memorization occurs, and we show that our proposed approach successfully mitigates memorization.

artificial intelligence, machine learning, memorization, (17 more...)

arXiv.org Artificial Intelligence

2411.16738

Country:

North America > United States > New York (0.04)
North America > United States > Connecticut > New Haven County > New Haven (0.04)
Europe > Netherlands (0.04)
Europe > Iceland > Capital Region > Reykjavik (0.04)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (0.54)
Transportation (0.46)
Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

Memorization in Attention-only Transformers

Dana, Léo, Pydi, Muni Sreenivas, Chevaleyre, Yann

arXiv.org Artificial IntelligenceNov-15-2024

Recent research has explored the memorization capacity of multi-head attention, but these findings are constrained by unrealistic limitations on the context size. We present a novel proof for language-based Transformers that extends the current hypothesis to any context size. Our approach improves upon the state-of-the-art by achieving more effective exact memorization with an attention layer, while also introducing the concept of approximate memorization of distributions. Through experimental validation, we demonstrate that our proposed bounds more accurately reflect the true memorization capacity of language models, and provide a precise comparison with prior work.

artificial intelligence, machine learning, transformer, (18 more...)

arXiv.org Artificial Intelligence

2411.10115

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Continual Memorization of Factoids in Large Language Models

Chen, Howard, Geng, Jiayi, Bhaskar, Adithya, Friedman, Dan, Chen, Danqi

arXiv.org Artificial IntelligenceNov-11-2024

Large language models can absorb a massive amount of knowledge through pretraining, but pretraining is inefficient for acquiring long-tailed or specialized facts. Therefore, fine-tuning on specialized or new knowledge that reflects changes in the world has become popular, though it risks disrupting the model's original capabilities. We study this fragility in the context of continual memorization, where the model is trained on a small set of long-tail factoids (factual associations) and must retain these factoids after multiple stages of subsequent training on other datasets. Through extensive experiments, we show that LLMs suffer from forgetting across a wide range of subsequent tasks, and simple replay techniques do not fully prevent forgetting, especially when the factoid datasets are trained in the later stages. We posit that there are two ways to alleviate forgetting: 1) protect the memorization process as the model learns the factoids, or 2) reduce interference from training in later stages. With this insight, we develop an effective mitigation strategy: REMIX (Random and Generic Data Mixing). REMIX prevents forgetting by mixing generic data sampled from pretraining corpora or even randomly generated word sequences during each stage, despite being unrelated to the memorized factoids in the first stage. REMIX can recover performance from severe forgetting, often outperforming replay-based methods that have access to the factoids from the first stage. We then analyze how REMIX alters the learning process and find that successful forgetting prevention is associated with a pattern: the model stores factoids in earlier layers than usual and diversifies the set of layers that store these factoids. The efficacy of REMIX invites further investigation into the underlying dynamics of memorization and forgetting, opening exciting possibilities for future research.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.07175

Country:

North America > United States > Kansas (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (0.82)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Scalability of memorization-based machine unlearning

Zhao, Kairan, Triantafillou, Peter

arXiv.org Artificial IntelligenceNov-1-2024

Machine unlearning (MUL) focuses on removing the influence of specific subsets of data (such as noisy, poisoned, or privacy-sensitive data) from pretrained models. MUL methods typically rely on specialized forms of fine-tuning. Recent research has shown that data memorization is a key characteristic defining the difficulty of MUL. As a result, novel memorization-based unlearning methods have been developed, demonstrating exceptional performance with respect to unlearning quality, while maintaining high performance for model utility. Alas, these methods depend on knowing the memorization scores of data points and computing said scores is a notoriously time-consuming process. This in turn severely limits the scalability of these solutions and their practical impact for real-world applications. In this work, we tackle these scalability challenges of state-of-the-art memorization-based MUL algorithms using a series of memorization-score proxies. We first analyze the profiles of various proxies and then evaluate the performance of state-of-the-art (memorization-based) MUL algorithms in terms of both accuracy and privacy preservation. Our empirical results show that these proxies can introduce accuracy on par with full memorization-based unlearning while dramatically improving scalability. We view this work as an important step toward scalable and efficient machine unlearning.

artificial intelligence, machine learning, proxy, (15 more...)

arXiv.org Artificial Intelligence

2410.16516

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Generalizability of Memorization Neural Networks

Yu, Lijia, Gao, Xiao-Shan, Zhang, Lijun, Miao, Yibo

arXiv.org Artificial IntelligenceNov-1-2024

The neural network memorization problem is to study the expressive power of neural networks to interpolate a finite dataset. Although memorization is widely believed to have a close relationship with the strong generalizability of deep learning when using over-parameterized models, to the best of our knowledge, there exists no theoretical study on the generalizability of memorization neural networks. In this paper, we give the first theoretical analysis of this topic. Since using i.i.d. training data is a necessary condition for a learning algorithm to be generalizable, memorization and its generalization theory for i.i.d. datasets are developed under mild conditions on the data distribution. First, algorithms are given to construct memorization networks for an i.i.d. dataset, which have the smallest number of parameters and even a constant number of parameters. Second, we show that, in order for the memorization networks to be generalizable, the width of the network must be at least equal to the dimension of the data, which implies that the existing memorization networks with an optimal number of parameters are not generalizable. Third, a lower bound for the sample complexity of general memorization algorithms and the exact sample complexity for memorization algorithms with constant number of parameters are given. It is also shown that there exist data distributions such that, to be generalizable for them, the memorization network must have an exponential number of parameters in the data dimension. Finally, an efficient and generalizable memorization algorithm is given when the number of training samples is greater than the efficient memorization sample complexity of the data distribution.

artificial intelligence, machine learning, memorization network, (15 more...)

arXiv.org Artificial Intelligence

2411.00372

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

A Geometric Framework for Understanding Memorization in Generative Models

Ross, Brendan Leigh, Kamkari, Hamidreza, Wu, Tongzi, Hosseinzadeh, Rasa, Liu, Zhaoyan, Stein, George, Cresswell, Jesse C., Loaiza-Ganem, Gabriel

arXiv.org Machine LearningOct-31-2024

As deep generative models have progressed, recent work has shown them to be capable of memorizing and reproducing training datapoints when deployed. These findings call into question the usability of generative models, especially in light of the legal and privacy risks brought about by memorization. To better understand this phenomenon, we propose the manifold memorization hypothesis (MMH), a geometric framework which leverages the manifold hypothesis into a clear language in which to reason about memorization. We propose to analyze memorization in terms of the relationship between the dimensionalities of $(i)$ the ground truth data manifold and $(ii)$ the manifold learned by the model. This framework provides a formal standard for "how memorized" a datapoint is and systematically categorizes memorized data into two types: memorization driven by overfitting and memorization driven by the underlying data distribution. By analyzing prior work in the context of the MMH, we explain and unify assorted observations in the literature. We empirically validate the MMH using synthetic data and image datasets up to the scale of Stable Diffusion, developing new tools for detecting and preventing generation of memorized samples in the process.

artificial intelligence, machine learning, memorization, (18 more...)

arXiv.org Machine Learning

2411.00113

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
North America > United States > New York (0.04)

Genre: Research Report > New Finding (0.87)

Industry:

Leisure & Entertainment (1.00)
Law (0.87)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.88)

Add feedback