AITopics | Memory-Based Learning

Collaborating Authors

Memory-Based Learning

[Sometimes called Case-Based Reasoning or CBR]
"At the highest level of generality, a general CBR cycle may be described by the following four processes: 1. RETRIEVE the most similar case or cases. 2. REUSE the information and knowledge in that case to solve the problem. 3. REVISE the proposed solution. 4. RETAIN the parts of this experience likely to be useful for future problem solving "– from Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches. By A. Aamodt and E. Plaza. (1994)

News Overviews Instructional Materials AI-Alerts Classics

Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning

Prabhakar, Akshara, Griffiths, Thomas L., McCoy, R. Thomas

arXiv.org Artificial IntelligenceJul-1-2024

Chain-of-Thought (CoT) prompting has been shown to enhance the multi-step reasoning capabilities of Large Language Models (LLMs). However, debates persist about whether LLMs exhibit abstract generalization or rely on shallow heuristics when given CoT prompts. To understand the factors influencing CoT reasoning we provide a detailed case study of the symbolic reasoning task of decoding shift ciphers, where letters are shifted forward some number of steps in the alphabet. GPT-4 achieves zero accuracy on most shift ciphers with standard prompting, but with CoT its accuracy improves to an average of 32%. By focusing on a single relatively simple task, we are able to identify three factors that systematically affect CoT performance: the probability of the task's expected output (probability), what the model has implicitly learned during pre-training (memorization), and the number of intermediate operations involved in reasoning (noisy reasoning). We show that these factors can drastically influence the task accuracy; e.g., varying the output's probability of occurrence can shift accuracy from 26% to 70%. We also demonstrate that it is essential for the model to explicitly produce intermediate steps as output that can be conditioned on to increase the probability of the correct answer. Our experiments indicate that as long as the model does so, the validity of the demonstrations in the prompt does not matter. Overall, we conclude that CoT prompting performance reflects both memorization and a probabilistic version of genuine reasoning.

accuracy, probability, reasoning, (17 more...)

arXiv.org Artificial Intelligence

2407.01687

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.84)

Add feedback

Rethinking LLM Memorization through the Lens of Adversarial Compression

Schwarzschild, Avi, Feng, Zhili, Maini, Pratyush, Lipton, Zachary C., Kolter, J. Zico

arXiv.org Artificial IntelligenceJul-1-2024

Large language models (LLMs) trained on web-scale datasets raise substantial concerns regarding permissible data usage. One major question is whether these models "memorize" all their training data or they integrate many data sources in some way more akin to how a human would learn and synthesize information. The answer hinges, to a large degree, on how we define memorization. In this work, we propose the Adversarial Compression Ratio (ACR) as a metric for assessing memorization in LLMs. A given string from the training data is considered memorized if it can be elicited by a prompt (much) shorter than the string itself--in other words, if these strings can be "compressed" with the model by computing adversarial prompts of fewer tokens. The ACR overcomes the limitations of existing notions of memorization by (i) offering an adversarial view of measuring memorization, especially for monitoring unlearning and compliance; and (ii) allowing for the flexibility to measure memorization for arbitrary strings at a reasonably low compute. Our definition serves as a practical tool for determining when model owners may be violating terms around data usage, providing a potential legal tool and a critical lens through which to address such scenarios. Find the Minimal Prompt PROMPT: urgesTOBE quote!

arxiv preprint arxiv, memorization, training data, (12 more...)

arXiv.org Artificial Intelligence

2404.15146

Country:

Europe (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Indiana > Marion County > Indianapolis (0.04)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Law (1.00)
Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Case-Based or Rule-Based: How Do Transformers Do the Math?

Hu, Yi, Tang, Xiaojuan, Yang, Haotong, Zhang, Muhan

arXiv.org Artificial IntelligenceJun-26-2024

Despite the impressive performance in a variety of complex tasks, modern large language models (LLMs) still have trouble dealing with some math problems that are simple and intuitive for humans, such as addition. While we can easily learn basic rules of addition and apply them to new problems of any length, LLMs struggle to do the same. Instead, they may rely on similar cases seen in the training corpus for help. We define these two different reasoning mechanisms as "rule-based reasoning" and "case-based reasoning". Since rule-based reasoning is essential for acquiring systematic generalization ability, we aim to explore exactly whether transformers use rule-based or case-based reasoning for math problems. Through carefully designed intervention experiments on five math tasks, we confirm that transformers are performing case-based reasoning, no matter whether scratchpad is used, which aligns with the previous observations that transformers use subgraph matching/shortcut learning to reason. To mitigate such problems, we propose a Rule-Following Fine-Tuning (RFFT) technique to teach transformers to perform rule-based reasoning. Specifically, we provide explicit rules in the input and then instruct transformers to recite and follow the rules step by step. Through RFFT, we successfully enable LLMs fine-tuned on 1-5 digit addition to generalize to up to 12-digit addition with over 95% accuracy, which is over 40% higher than scratchpad. The significant improvement demonstrates that teaching LLMs to use rules explicitly helps them learn rule-based reasoning and generalize better in length.

digit, experiment, reasoning, (16 more...)

arXiv.org Artificial Intelligence

2402.17709

Country:

Europe > Austria > Vienna (0.14)
Asia > China (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

Prashanth, USVSN Sai, Deng, Alvin, O'Brien, Kyle, S, Jyothir V, Khan, Mohammad Aflah, Borkar, Jaydeep, Choquette-Choo, Christopher A., Fuehne, Jacob Ray, Biderman, Stella, Ke, Tracy, Lee, Katherine, Saphra, Naomi

arXiv.org Artificial IntelligenceJun-25-2024

Memorization in language models is typically treated as a homogenous phenomenon, neglecting the specifics of the memorized data. We instead model memorization as the effect of a set of complex factors that describe each sample and relate it to the model and corpus. To build intuition around these factors, we break memorization down into a taxonomy: recitation of highly duplicated sequences, reconstruction of inherently predictable sequences, and recollection of sequences that are neither. We demonstrate the usefulness of our taxonomy by using it to construct a predictive model for memorization. By analyzing dependencies and inspecting the weights of the predictive model, we find that different factors influence the likelihood of memorization differently depending on the taxonomic category.

category, memorization, sequence, (16 more...)

arXiv.org Artificial Intelligence

2406.17746

Country:

Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Law > Intellectual Property & Technology Law (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models

Zhu, Junyi, Liu, Shuochen, Yu, Yu, Tang, Bo, Yan, Yibo, Li, Zhiyu, Xiong, Feiyu, Xu, Tong, Blaschko, Matthew B.

arXiv.org Artificial IntelligenceJun-23-2024

Large language models (LLMs) excel in generating coherent text, but they often struggle with context awareness, leading to inaccuracies in tasks requiring faithful adherence to provided information. We introduce FastMem, a novel method designed to enhance instruction fine-tuned LLMs' context awareness through fast memorization of the prompt. FastMem maximizes the likelihood of the prompt before inference by fine-tuning only the last Feed-Forward Network (FFN) module. This targeted approach ensures efficient optimization without overfitting, significantly improving the model's ability to comprehend and accurately follow the context. Our experiments demonstrate substantial gains in reading comprehension, text summarization and adherence to output structures. For instance, FastMem improves the accuracy of Llama 3-8B-Inst on the NQ-SWAP dataset from 59.1% to 71.6%, and reduces the output structure failure rate of Qwen 1.5-4B-Chat from 34.9% to 25.5%. Extensive experimental results highlight FastMem's potential to offer a robust solution to enhance the reliability and accuracy of LLMs in various applications. Our code is available at: https://github.com/IAAR-Shanghai/FastMem

computational linguistic, evaluation, fastmem, (15 more...)

arXiv.org Artificial Intelligence

2406.16069

Country:

Asia > China > Shanghai > Shanghai (0.24)
North America > Dominican Republic (0.04)
North America > United States > Utah (0.04)
(8 more...)

Genre: Research Report > Promising Solution (0.48)

Industry:

Leisure & Entertainment (1.00)
Government > Military (1.00)
Media > Film (0.68)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.63)

Add feedback

Can LLM Graph Reasoning Generalize beyond Pattern Memorization?

Zhang, Yizhuo, Wang, Heng, Feng, Shangbin, Tan, Zhaoxuan, Han, Xiaochuang, He, Tianxing, Tsvetkov, Yulia

arXiv.org Artificial IntelligenceJun-22-2024

Large language models (LLMs) demonstrate great potential for problems with implicit graphical structures, while recent works seek to enhance the graph reasoning capabilities of LLMs through specialized instruction tuning. The resulting 'graph LLMs' are evaluated with in-distribution settings only, thus it remains underexplored whether LLMs are learning generalizable graph reasoning skills or merely memorizing patterns in the synthetic training data. To this end, we propose the NLGift benchmark, an evaluation suite of LLM graph reasoning generalization: whether LLMs could go beyond semantic, numeric, structural, reasoning patterns in the synthetic training data and improve utility on real-world graph-based tasks. Extensive experiments with two LLMs across four graph reasoning tasks demonstrate that while generalization on simple patterns (semantic, numeric) is somewhat satisfactory, LLMs struggle to generalize across reasoning and real-world patterns, casting doubt on the benefit of synthetic graph tuning for real-world tasks with underlying network structures. We explore three strategies to improve LLM graph reasoning generalization, and we find that while post-training alignment is most promising for real-world tasks, empowering LLM graph reasoning to go beyond pattern memorization remains an open research question.

generalization, graph, reasoning, (14 more...)

arXiv.org Artificial Intelligence

2406.15992

Country:

Asia > Singapore (0.05)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(5 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Scaling Laws for Fact Memorization of Large Language Models

Lu, Xingyu, Li, Xiaonan, Cheng, Qinyuan, Ding, Kai, Huang, Xuanjing, Qiu, Xipeng

arXiv.org Artificial IntelligenceJun-21-2024

Fact knowledge memorization is crucial for Large Language Models (LLM) to generate factual and reliable responses. However, the behaviors of LLM fact memorization remain under-explored. In this paper, we analyze the scaling laws for LLM's fact knowledge and LLMs' behaviors of memorizing different types of facts. We find that LLMs' fact knowledge capacity has a linear and negative exponential law relationship with model size and training epochs, respectively. Estimated by the built scaling law, memorizing the whole Wikidata's facts requires training an LLM with 1000B non-embed parameters for 100 epochs, suggesting that using LLMs to memorize all public facts is almost implausible for a general pre-training setting. Meanwhile, we find that LLMs can generalize on unseen fact knowledge and its scaling law is similar to general pre-training. Additionally, we analyze the compatibility and preference of LLMs' fact memorization. For compatibility, we find LLMs struggle with memorizing redundant facts in a unified way. Only when correlated facts have the same direction and structure, the LLM can compatibly memorize them. This shows the inefficiency of LLM memorization for redundant facts. For preference, the LLM pays more attention to memorizing more frequent and difficult facts, and the subsequent facts can overwrite prior facts' memorization, which significantly hinders low-frequency facts memorization. Our findings reveal the capacity and characteristics of LLMs' fact knowledge learning, which provide directions for LLMs' fact knowledge augmentation.

company information table, llm, memorization, (15 more...)

arXiv.org Artificial Intelligence

2406.1572

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models

Duan, Sunny, Khona, Mikail, Iyer, Abhiram, Schaeffer, Rylan, Fiete, Ila R

arXiv.org Artificial IntelligenceJun-20-2024

Frontier AI systems are making transformative impacts across society, but such benefits are not without costs: models trained on web-scale datasets containing personal and private data raise profound concerns about data privacy and security. Language models are trained on extensive corpora including potentially sensitive or proprietary information, and the risk of data leakage -- where the model response reveals pieces of such information -- remains inadequately understood. Prior work has investigated what factors drive memorization and have identified that sequence complexity and the number of repetitions drive memorization. Here, we focus on the evolution of memorization over training. We begin by reproducing findings that the probability of memorizing a sequence scales logarithmically with the number of times it is present in the data. We next show that sequences which are apparently not memorized after the first encounter can be "uncovered" throughout the course of training even without subsequent encounters, a phenomenon we term "latent memorization". The presence of latent memorization presents a challenge for data privacy as memorized sequences may be hidden at the final checkpoint of the model but remain easily recoverable. To this end, we develop a diagnostic test relying on the cross entropy loss to uncover latent memorized sequences with high accuracy.

arxiv, memorized sequence, sequence, (13 more...)

arXiv.org Artificial Intelligence

2406.14549

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

ROME: Memorization Insights from Text, Logits and Representation

Li, Bo, Zhao, Qinghua, Wen, Lijie

arXiv.org Artificial IntelligenceJun-16-2024

Previous works have evaluated memorization by comparing model outputs with training corpora, examining how factors such as data duplication, model size, and prompt length influence memorization. However, analyzing these extensive training corpora is highly time-consuming. To address this challenge, this paper proposes an innovative approach named ROME that bypasses direct processing of the training data. Specifically, we select datasets categorized into three distinct types -- context-independent, conventional, and factual -- and redefine memorization as the ability to produce correct answers under these conditions. Our analysis then focuses on disparities between memorized and non-memorized samples by examining the logits and representations of generated texts. Experimental findings reveal that longer words are less likely to be memorized, higher confidence correlates with greater memorization, and representations of the same concepts are more similar across different contexts. Our code and data will be publicly available when the paper is accepted.

dataset, memorization, representation, (15 more...)

arXiv.org Artificial Intelligence

2403.0051

Country:

Asia > China > Beijing > Beijing (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs

Hans, Abhimanyu, Wen, Yuxin, Jain, Neel, Kirchenbauer, John, Kazemi, Hamid, Singhania, Prajwal, Singh, Siddharth, Somepalli, Gowthami, Geiping, Jonas, Bhatele, Abhinav, Goldstein, Tom

arXiv.org Artificial IntelligenceJun-14-2024

To mitigate memorization, we introduce a subtle modification to the next-token training objective that we call the goldfish loss. During training, a randomly sampled subset of tokens are excluded from the loss computation. These dropped tokens are not memorized by the model, which prevents verbatim reproduction of a complete chain of tokens from the training set. We run extensive experiments training billion-scale Llama-2 models, both pre-trained and trained from scratch, and demonstrate significant reductions in extractable memorization with little to no impact on downstream benchmarks.

goldfish loss, memorization, training data, (13 more...)

arXiv.org Artificial Intelligence

2406.10209

Country:

Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
South America > Colombia > Meta Department > Villavicencio (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback