AITopics | Rote Learning

Collaborating Authors

Rote Learning

News Overviews Instructional Materials AI-Alerts Classics

Network size and size of the weights in memorization with two-layers neural networks

Neural Information Processing SystemsOct-9-2024, 22:41:21 GMT

In 1988, Eric B. Baum showed that two-layers neural networks with threshold activation function can perfectly memorize the binary labels of n points in general position in \R d using only \ulcorner n/d \urcorner neurons. We observe that with ReLU networks, using four times as many neurons one can fit arbitrary real labels. Moreover, for approximate memorization up to error \epsilon, the neural tangent kernel can also memorize with only O\left(\frac{n}{d} \cdot \log(1/\epsilon) \right) neurons (assuming that the data is well dispersed too). We show however that these constructions give rise to networks where the \emph{magnitude} of the neurons' weights are far from optimal. In contrast we propose a new training procedure for ReLU networks, based on {\em complex} (as opposed to {\em real}) recombination of the neurons, for which we show approximate memorization with both O\left(\frac{n}{d} \cdot \frac{\log(1/\epsilon)}{\epsilon}\right) neurons, as well as nearly-optimal size of the weights.

memorization, neuron, two-layer neural network, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.91)

Add feedback

Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion

Zaman, Kerem, Choshen, Leshem, Srivastava, Shashank

arXiv.org Artificial IntelligenceOct-9-2024

Model fusion research aims to aggregate the knowledge of multiple individual models to enhance performance by combining their weights. In this work, we study the inverse problem: investigating whether model fusion can be used to reduce unwanted knowledge. We investigate the effects of model fusion in three scenarios: the learning of shortcuts, social biases, and memorization of training data in fine-tuned language models. Through experiments covering classification and generation tasks, our analysis highlights that shared knowledge among models is enhanced during model fusion, while unshared knowledge is usually forgotten. Based on this observation, we demonstrate the potential of model fusion as a debiasing tool and showcase its efficacy in addressing privacy concerns associated with language models.

knowledge, model fusion, shortcut, (15 more...)

arXiv.org Artificial Intelligence

2311.07682

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Asia > Middle East > Jordan (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.64)

Add feedback

Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-context Models

Liu, Xinyu, Zhao, Runsong, Huang, Pengcheng, Xiao, Chunyang, Li, Bei, Wang, Jingang, Xiao, Tong, Zhu, Jingbo

arXiv.org Artificial IntelligenceOct-6-2024

Numerous recent works target to extend effective context length for language models and various methods, tasks and benchmarks exist to measure model's effective memorization length. However, through thorough investigations, we find limitations for currently existing evaluations on model's memorization capability. We provide an extensive survey for limitations in this work and propose a new method called forgetting curve to measure the memorization capability of long-context models. We show that forgetting curve has the advantage of being robust to the tested corpus and the experimental settings, of not relying on prompts and can be applied to any model size. We apply our forgetting curve to a large variety of models involving both transformer and RNN/SSM based architectures. Our measurement provides empirical evidence for the effectiveness of transformer extension techniques while raises questions for the effective length of RNN/SSM based models. We also examine the difference between our measurement and existing benchmarks as well as popular metrics for various models. Our code and results can be found at https://github.com/1azybug/ForgettingCurve.

accuracy, arxiv preprint arxiv, language model, (14 more...)

arXiv.org Artificial Intelligence

2410.04727

Country:

Asia > China > Liaoning Province > Shenyang (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Undesirable Memorization in Large Language Models: A Survey

Satvaty, Ali, Verberne, Suzan, Turkmen, Fatih

arXiv.org Artificial IntelligenceOct-3-2024

While recent research increasingly showcases the remarkable capabilities of Large Language Models (LLMs), it's vital to confront their hidden pitfalls. Among these challenges, the issue of memorization stands out, posing significant ethical and legal risks. In this paper, we presents a Systematization of Knowledge (SoK) on the topic of memorization in LLMs. Memorization is the effect that a model tends to store and reproduce phrases or passages from the training data and has been shown to be the fundamental issue to various privacy and security attacks against LLMs. We begin by providing an overview of the literature on the memorization, exploring it across five key dimensions: intentionality, degree, retrievability, abstraction, and transparency. Next, we discuss the metrics and methods used to measure memorization, followed by an analysis of the factors that contribute to memorization phenomenon. We then examine how memorization manifests itself in specific model architectures and explore strategies for mitigating these effects. We conclude our overview by identifying potential research topics for the near future: to develop methods for balancing performance and privacy in LLMs, and the analysis of memorization in specific contexts, including conversational agents, retrieval-augmented generation, multilingual language models, and diffusion language models.

language model, memorization, training data, (13 more...)

arXiv.org Artificial Intelligence

2410.0265

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)
(11 more...)

Genre:

Overview (0.93)
Research Report > New Finding (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Mitigating Memorization In Language Models

Sakarvadia, Mansi, Ajith, Aswathy, Khan, Arham, Hudson, Nathaniel, Geniesse, Caleb, Chard, Kyle, Yang, Yaoqing, Foster, Ian, Mahoney, Michael W.

arXiv.org Artificial IntelligenceOct-2-2024

Language models (LMs) can "memorize" information, i.e., encode training data in their weights in such a way that inference-time queries can lead to verbatim regurgitation of that data. This ability to extract training data can be problematic, for example, when data are private or sensitive. In this work, we investigate methods to mitigate memorization: three regularizer-based, three finetuning-based, and eleven machine unlearning-based methods, with five of the latter being new methods that we introduce. We also introduce TinyMem, a suite of small, computationally-efficient LMs for the rapid development and evaluation of memorization-mitigation methods. We demonstrate that the mitigation methods that we develop using TinyMem can successfully be applied to production-grade LMs, and we determine via experiment that: regularizer-based mitigation methods are slow and ineffective at curbing memorization; fine-tuning-based methods are effective at curbing memorization, but overly expensive, especially for retaining higher accuracies; and unlearning-based methods are faster and more effective, allowing for the precise localization and removal of memorized information from LM weights prior to inference. We show, in particular, that our proposed unlearning method BalancedSubnet outperforms other mitigation methods at removing memorized information while preserving performance on target tasks.

conference paper, memorization, sequence, (15 more...)

arXiv.org Artificial Intelligence

2410.02159

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Optimal Memorization Capacity of Transformers

Kajitsuka, Tokio, Sato, Issei

arXiv.org Artificial IntelligenceSep-26-2024

In recent years, the Transformer architecture (Vaswani et al., 2017) has played a pivotal role in the field of machine learning, becoming indispensable for a variety of models in the community. In addition to the original breakthroughs in natural language processing, such as the GPT series (Brown et al., 2020; Radford et al., 2018, 2019), it has been observed that in numerous applications, higher accuracy can be achieved by replacing existing models with Transformers. Specifically, models such as the Vision Transformer (Dosovitskiy et al., 2021) in image processing and the Diffusion Transformer (Peebles & Xie, 2023) in generative tasks have demonstrated exceptional performances in a wide variety of tasks. These examples demonstrate how effective and versatile Transformers are for a diverse range of purposes. Although the high performance of Transformers has led to their widespread use in practice, there are ongoing attempts to theoretically analyze what exactly contributes to their superior performance.

input sequence, sequence, transformer, (15 more...)

arXiv.org Artificial Intelligence

2409.17677

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.43)

Add feedback

Rethinking LLM memorization

AIHubSep-23-2024, 10:14:43 GMT

A central question in the discussion of large language models (LLMs) concerns the extent to which they memorize their training data versus how they generalize to new tasks and settings. Most practitioners seem to (at least informally) believe that LLMs do some degree of both: they clearly memorize parts of the training data--for example, they are often able to reproduce large portions of training data verbatim [Carlini et al., 2023]--but they also seem to learn from this data, allowing them to generalize to new settings. The precise extent to which they do one or the other has massive implications for the practical and legal aspects of such models [Cooper et al., 2023]. Do LLMs truly produce new content, or do they only remix their training data? When dealing with humans, we distinguish plagiarizing content from learning from it, but how should this extend to LLMs?

llm, memorization, training data, (16 more...)

AIHub

Industry: Law (0.90)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.57)

Add feedback

Data-centric NLP Backdoor Defense from the Lens of Memorization

Wang, Zhenting, Wang, Zhizhi, Jin, Mingyu, Du, Mengnan, Zhai, Juan, Ma, Shiqing

arXiv.org Artificial IntelligenceSep-21-2024

Backdoor attack is a severe threat to the trustworthiness of DNN-based language models. In this paper, we first extend the definition of memorization of language models from sample-wise to more fine-grained sentence element-wise (e.g., word, phrase, structure, and style), and then point out that language model backdoors are a type of element-wise memorization. Through further analysis, we find that the strength of such memorization is positively correlated to the frequency of duplicated elements in the training dataset. In conclusion, duplicated sentence elements are necessary for successful backdoor attacks. Based on this, we propose a data-centric defense. We first detect trigger candidates in training data by finding memorizable elements, i.e., duplicated elements, and then confirm real triggers by testing if the candidates can activate backdoor behaviors (i.e., malicious elements). Results show that our method outperforms state-of-the-art defenses in defending against different types of NLP backdoors.

backdoor attack, memorization, training data, (12 more...)

arXiv.org Artificial Intelligence

2409.142

Country:

North America > United States > Hawaii (0.04)
Asia > Nepal (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.68)

Add feedback

Unlocking Memorization in Large Language Models with Dynamic Soft Prompting

Wang, Zhepeng, Bao, Runxue, Wu, Yawen, Taylor, Jackson, Xiao, Cao, Zheng, Feng, Jiang, Weiwen, Gao, Shangqian, Zhang, Yanfu

arXiv.org Artificial IntelligenceSep-20-2024

Pretrained large language models (LLMs) have revolutionized natural language processing (NLP) tasks such as summarization, question answering, and translation. However, LLMs pose significant security risks due to their tendency to memorize training data, leading to potential privacy breaches and copyright infringement. Accurate measurement of this memorization is essential to evaluate and mitigate these potential risks. However, previous attempts to characterize memorization are constrained by either using prefixes only or by prepending a constant soft prompt to the prefixes, which cannot react to changes in input. To address this challenge, we propose a novel method for estimating LLM memorization using dynamic, prefix-dependent soft prompts. Our approach involves training a transformer-based generator to produce soft prompts that adapt to changes in input, thereby enabling more accurate extraction of memorized data. Our method not only addresses the limitations of previous methods but also demonstrates superior performance in diverse experimental settings compared to state-of-the-art techniques. In particular, our method can achieve the maximum relative improvement of 112.75% and 32.26% over the vanilla baseline in terms of discoverable memorization rate for the text generation task and code generation task respectively.

language model, memorization, soft prompt, (14 more...)

arXiv.org Artificial Intelligence

2409.13853

Country: Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > Promising Solution (0.68)

Industry:

Law > Intellectual Property & Technology Law (0.54)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Promise and Peril of Collaborative Code Generation Models: Balancing Effectiveness and Memorization

Chen, Zhi, Jiang, Lingxiao

arXiv.org Artificial IntelligenceSep-18-2024

In the rapidly evolving field of machine learning, training models with datasets from various locations and organizations presents significant challenges due to privacy and legal concerns. The exploration of effective collaborative training settings capable of leveraging valuable knowledge from distributed and isolated datasets is increasingly crucial. This study investigates key factors that impact the effectiveness of collaborative training methods in code next-token prediction, as well as the correctness and utility of the generated code, demonstrating the promise of such methods. Additionally, we evaluate the memorization of different participant training data across various collaborative training settings, including centralized, federated, and incremental training, highlighting their potential risks in leaking data. Our findings indicate that the size and diversity of code datasets are pivotal factors influencing the success of collaboratively trained code models. We show that federated learning achieves competitive performance compared to centralized training while offering better data protection, as evidenced by lower memorization ratios in the generated code. However, federated learning can still produce verbatim code snippets from hidden training data, potentially violating privacy or copyright. Our study further explores effectiveness and memorization patterns in incremental learning, emphasizing the sequence in which individual participant datasets are introduced. We also identify cross-organizational clones as a prevalent challenge in both centralized and federated learning scenarios. Our findings highlight the persistent risk of data leakage during inference, even when training data remains unseen. We conclude with recommendations for practitioners and researchers to optimize multisource datasets, propelling cross-organizational collaboration forward.

balancing effectiveness and memorization, collaborative code generation model, promise and peril

arXiv.org Artificial Intelligence

2409.1202

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback