AITopics | Memory-Based Learning

Collaborating Authors

Memory-Based Learning

[Sometimes called Case-Based Reasoning or CBR]
"At the highest level of generality, a general CBR cycle may be described by the following four processes: 1. RETRIEVE the most similar case or cases. 2. REUSE the information and knowledge in that case to solve the problem. 3. REVISE the proposed solution. 4. RETAIN the parts of this experience likely to be useful for future problem solving "– from Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches. By A. Aamodt and E. Plaza. (1994)

News Overviews Instructional Materials AI-Alerts Classics

Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

Neural Information Processing SystemsOct-10-2024, 13:48:00 GMT

The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at least a layer with \Omega(N) neurons, N being the number of training samples. Furthermore, there is increasing evidence suggesting that deep networks with sub-linear layer widths are powerful memorizers and optimizers, as long as the number of parameters exceeds the number of samples. Thus, a natural open question is whether the NTK is well conditioned in such a challenging sub-linear setup. In this paper, we answer this question in the affirmative. Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks with the minimum possible over-parameterization: up to logarithmic factors, the number of parameters is \Omega(N) and, hence, the number of neurons is as little as \Omega(\sqrt{N}) .

deep neural network, memorization and optimization, minimum over-parameterization, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

Neural Networks Learning and Memorization with (almost) no Over-Parameterization

Neural Information Processing SystemsOct-10-2024, 10:05:22 GMT

Many results in recent years established polynomial time learnability of various models via neural networks algorithms (e.g. However, unless the model is linear separable \cite{brutzkus2018sgd}, or the activation is a polynomial \cite{ge2019mildly}, these results require very large networks -- much more than what is needed for the mere existence of a good predictor. In this paper we prove that SGD on depth two neural networks can memorize samples, learn polynomials with bounded weights, and learn certain kernel spaces, with {\em near optimal} network size, sample complexity, and runtime. In particular, we show that SGD on depth two network with \tilde{O}\left(\frac{m}{d}\right) hidden neurons (and hence \tilde{O}(m) parameters) can memorize m random labeled points in \sphere {d-1} .

neural network learning and memorization, over-parameterization, polynomial, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.40)

Add feedback

Network size and size of the weights in memorization with two-layers neural networks

Neural Information Processing SystemsOct-9-2024, 22:41:21 GMT

In 1988, Eric B. Baum showed that two-layers neural networks with threshold activation function can perfectly memorize the binary labels of n points in general position in \R d using only \ulcorner n/d \urcorner neurons. We observe that with ReLU networks, using four times as many neurons one can fit arbitrary real labels. Moreover, for approximate memorization up to error \epsilon, the neural tangent kernel can also memorize with only O\left(\frac{n}{d} \cdot \log(1/\epsilon) \right) neurons (assuming that the data is well dispersed too). We show however that these constructions give rise to networks where the \emph{magnitude} of the neurons' weights are far from optimal. In contrast we propose a new training procedure for ReLU networks, based on {\em complex} (as opposed to {\em real}) recombination of the neurons, for which we show approximate memorization with both O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right) neurons, as well as nearly-optimal size of the weights.

memorization, neuron, two-layer neural network, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.91)

Add feedback

Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion

Zaman, Kerem, Choshen, Leshem, Srivastava, Shashank

arXiv.org Artificial IntelligenceOct-9-2024

Model fusion research aims to aggregate the knowledge of multiple individual models to enhance performance by combining their weights. In this work, we study the inverse problem: investigating whether model fusion can be used to reduce unwanted knowledge. We investigate the effects of model fusion in three scenarios: the learning of shortcuts, social biases, and memorization of training data in fine-tuned language models. Through experiments covering classification and generation tasks, our analysis highlights that shared knowledge among models is enhanced during model fusion, while unshared knowledge is usually forgotten. Based on this observation, we demonstrate the potential of model fusion as a debiasing tool and showcase its efficacy in addressing privacy concerns associated with language models.

knowledge, model fusion, shortcut, (15 more...)

arXiv.org Artificial Intelligence

2311.07682

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Asia > Middle East > Jordan (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.64)

Add feedback

Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models

Liu, Xinyu, Zhao, Runsong, Huang, Pengcheng, Xiao, Chunyang, Li, Bei, Wang, Jingang, Xiao, Tong, Zhu, Jingbo

arXiv.org Artificial IntelligenceOct-6-2024

Numerous recent works target to extend effective context length for language models and various methods, tasks and benchmarks exist to measure model's effective memorization length. However, through thorough investigations, we find limitations for currently existing evaluations on model's memorization capability. We provide an extensive survey for limitations in this work and propose a new method called forgetting curve to measure the memorization capability of long-context models. We show that forgetting curve has the advantage of being robust to the tested corpus and the experimental settings, of not relying on prompts and can be applied to any model size. We apply our forgetting curve to a large variety of models involving both transformer and RNN/SSM based architectures. Our measurement provides empirical evidence for the effectiveness of transformer extension techniques while raises questions for the effective length of RNN/SSM based models. We also examine the difference between our measurement and existing benchmarks as well as popular metrics for various models. Our code and results can be found at https://github.com/1azybug/ForgettingCurve.

accuracy, arxiv preprint arxiv, language model, (14 more...)

arXiv.org Artificial Intelligence

2410.04727

Country:

Asia > China > Liaoning Province > Shenyang (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Undesirable Memorization in Large Language Models: A Survey

Satvaty, Ali, Verberne, Suzan, Turkmen, Fatih

arXiv.org Artificial IntelligenceOct-3-2024

While recent research increasingly showcases the remarkable capabilities of Large Language Models (LLMs), it's vital to confront their hidden pitfalls. Among these challenges, the issue of memorization stands out, posing significant ethical and legal risks. In this paper, we presents a Systematization of Knowledge (SoK) on the topic of memorization in LLMs. Memorization is the effect that a model tends to store and reproduce phrases or passages from the training data and has been shown to be the fundamental issue to various privacy and security attacks against LLMs. We begin by providing an overview of the literature on the memorization, exploring it across five key dimensions: intentionality, degree, retrievability, abstraction, and transparency. Next, we discuss the metrics and methods used to measure memorization, followed by an analysis of the factors that contribute to memorization phenomenon. We then examine how memorization manifests itself in specific model architectures and explore strategies for mitigating these effects. We conclude our overview by identifying potential research topics for the near future: to develop methods for balancing performance and privacy in LLMs, and the analysis of memorization in specific contexts, including conversational agents, retrieval-augmented generation, multilingual language models, and diffusion language models.

language model, memorization, training data, (13 more...)

arXiv.org Artificial Intelligence

2410.0265

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)
(11 more...)

Genre:

Overview (0.93)
Research Report > New Finding (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Mitigating Memorization In Language Models

Sakarvadia, Mansi, Ajith, Aswathy, Khan, Arham, Hudson, Nathaniel, Geniesse, Caleb, Chard, Kyle, Yang, Yaoqing, Foster, Ian, Mahoney, Michael W.

arXiv.org Artificial IntelligenceOct-2-2024

Language models (LMs) can "memorize" information, i.e., encode training data in their weights in such a way that inference-time queries can lead to verbatim regurgitation of that data. This ability to extract training data can be problematic, for example, when data are private or sensitive. In this work, we investigate methods to mitigate memorization: three regularizer-based, three finetuning-based, and eleven machine unlearning-based methods, with five of the latter being new methods that we introduce. We also introduce TinyMem, a suite of small, computationally-efficient LMs for the rapid development and evaluation of memorization-mitigation methods. We demonstrate that the mitigation methods that we develop using TinyMem can successfully be applied to production-grade LMs, and we determine via experiment that: regularizer-based mitigation methods are slow and ineffective at curbing memorization; fine-tuning-based methods are effective at curbing memorization, but overly expensive, especially for retaining higher accuracies; and unlearning-based methods are faster and more effective, allowing for the precise localization and removal of memorized information from LM weights prior to inference. We show, in particular, that our proposed unlearning method BalancedSubnet outperforms other mitigation methods at removing memorized information while preserving performance on target tasks.

conference paper, memorization, sequence, (15 more...)

arXiv.org Artificial Intelligence

2410.02159

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Optimal Memorization Capacity of Transformers

Kajitsuka, Tokio, Sato, Issei

arXiv.org Artificial IntelligenceSep-26-2024

In recent years, the Transformer architecture (Vaswani et al., 2017) has played a pivotal role in the field of machine learning, becoming indispensable for a variety of models in the community. In addition to the original breakthroughs in natural language processing, such as the GPT series (Brown et al., 2020; Radford et al., 2018, 2019), it has been observed that in numerous applications, higher accuracy can be achieved by replacing existing models with Transformers. Specifically, models such as the Vision Transformer (Dosovitskiy et al., 2021) in image processing and the Diffusion Transformer (Peebles & Xie, 2023) in generative tasks have demonstrated exceptional performances in a wide variety of tasks. These examples demonstrate how effective and versatile Transformers are for a diverse range of purposes. Although the high performance of Transformers has led to their widespread use in practice, there are ongoing attempts to theoretically analyze what exactly contributes to their superior performance.

input sequence, sequence, transformer, (15 more...)

arXiv.org Artificial Intelligence

2409.17677

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.43)

Add feedback

CaBRNet, an open-source library for developing and evaluating Case-Based Reasoning Models

Xu-Darme, Romain, Varasse, Aymeric, Grastien, Alban, Girard, Julien, Chihani, Zakaria

arXiv.org Artificial IntelligenceSep-25-2024

As a reflection of the social and ethical concerns related to the increasing use of AI-based systems in modern society, the field of explainable AI (XAI) has gained tremendous momentum in recent years. XAI mainly consists of two complementary avenues of research that aim at shedding some light into the inner-workings of complex ML models. On the one hand, post-hoc explanation methods apply to existing models that have often been trained with the sole purpose of accomplishing a given task as efficiently as possible (e.g., accuracy in a classification task). On the other hand, self-explainable models are designed and trained to produce their own explanations along with their decision. The appeal of selfexplainable models resides in the fact that rather than using an approximation (i.e., a post-hoc explanation method) to understand a complex model, it is better to directly enforce a simpler (and more understandable) decision-making process during the design and training of the ML model, provided that such a model would exhibit an acceptable level of performance.

architecture, cabrnet, reproducibility, (17 more...)

arXiv.org Artificial Intelligence

2409.16693

Country: Europe > France (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)

Add feedback

Rethinking LLM memorization

AIHubSep-23-2024, 10:14:43 GMT

A central question in the discussion of large language models (LLMs) concerns the extent to which they memorize their training data versus how they generalize to new tasks and settings. Most practitioners seem to (at least informally) believe that LLMs do some degree of both: they clearly memorize parts of the training data--for example, they are often able to reproduce large portions of training data verbatim [Carlini et al., 2023]--but they also seem to learn from this data, allowing them to generalize to new settings. The precise extent to which they do one or the other has massive implications for the practical and legal aspects of such models [Cooper et al., 2023]. Do LLMs truly produce new content, or do they only remix their training data? When dealing with humans, we distinguish plagiarizing content from learning from it, but how should this extend to LLMs?

llm, memorization, training data, (16 more...)

AIHub

Industry: Law (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.57)

Add feedback