AITopics | Memory-Based Learning

Collaborating Authors

Memory-Based Learning

[Sometimes called Case-Based Reasoning or CBR]
"At the highest level of generality, a general CBR cycle may be described by the following four processes: 1. RETRIEVE the most similar case or cases. 2. REUSE the information and knowledge in that case to solve the problem. 3. REVISE the proposed solution. 4. RETAIN the parts of this experience likely to be useful for future problem solving "– from Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches. By A. Aamodt and E. Plaza. (1994)

News Overviews Instructional Materials AI-Alerts Classics

Emergent and Predictable Memorization in Large Language Models

Biderman, Stella, Prashanth, USVSN Sai, Sutawika, Lintang, Schoelkopf, Hailey, Anthony, Quentin, Purohit, Shivanshu, Raff, Edward

arXiv.org Artificial IntelligenceMay-31-2023

Memorization, or the tendency of large language models (LLMs) to output entire sequences from their training data verbatim, is a key concern for safely deploying language models. In particular, it is vital to minimize a model's memorization of sensitive datapoints such as those containing personal identifiable information (PII). The prevalence of such undesirable memorization can pose issues for model trainers, and may even require discarding an otherwise functional model. We therefore seek to predict which sequences will be memorized before a large model's full train-time by extrapolating the memorization behavior of lower-compute trial runs. We measure memorization of the Pythia model suite and plot scaling laws for forecasting memorization, allowing us to provide equi-compute recommendations to maximize the reliability (recall) of such predictions. We additionally provide further novel discoveries on the distribution of memorization scores across models and data.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2304.11158

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Ohio (0.04)
North America > United States > Maryland > Baltimore County (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

On Influence Functions, Classification Influence, Relative Influence, Memorization and Generalization

Kounavis, Michael, Dia, Ousmane, Ramazanli, Ilqar

arXiv.org Artificial IntelligenceMay-25-2023

Machine learning systems such as large scale recommendation systems or natural language processing systems are usually trained on billions of training points and are associated with hundreds of billions or trillions of parameters. Improving the learning process in such a way that both the training load is reduced and the model accuracy improved is highly desired. In this paper we take a first step toward solving this problem, studying influence functions from the perspective of simplifying the computations they involve. We discuss assumptions, under which influence computations can be performed on significantly fewer parameters. We also demonstrate that the sign of the influence value can indicate whether a training point is to memorize, as opposed to generalize upon. For this purpose we formally define what memorization means for a training point, as opposed to generalization. We conclude that influence functions can be made practical, even for large scale machine learning systems, and that influence values can be taken into account by algorithms that selectively remove training points, as part of the learning process.

artificial intelligence, machine learning, training point, (17 more...)

arXiv.org Artificial Intelligence

2305.16094

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Washington > King County > Bellevue (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.60)

Add feedback

Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

Bombari, Simone, Amani, Mohammad Hossein, Mondelli, Marco

arXiv.org Artificial IntelligenceMay-21-2023

The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at least a layer with $\Omega(N)$ neurons, $N$ being the number of training samples. Furthermore, there is increasing evidence suggesting that deep networks with sub-linear layer widths are powerful memorizers and optimizers, as long as the number of parameters exceeds the number of samples. Thus, a natural open question is whether the NTK is well conditioned in such a challenging sub-linear setup. In this paper, we answer this question in the affirmative. Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks with the minimum possible over-parameterization: the number of parameters is roughly $\Omega(N)$ and, hence, the number of neurons is as little as $\Omega(\sqrt{N})$. To showcase the applicability of our NTK bounds, we provide two results concerning memorization capacity and optimization guarantees for gradient descent training.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2205.10217

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland (0.04)
Europe > Austria (0.04)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.81)

Add feedback

Case-Based Reasoning with Language Models for Classification of Logical Fallacies

Sourati, Zhivar, Ilievski, Filip, Sandlin, Hông-Ân, Mermoud, Alain

arXiv.org Artificial IntelligenceMay-17-2023

The ease and speed of spreading misinformation and propaganda on the Web motivate the need to develop trustworthy technology for detecting fallacies in natural language arguments. However, state-of-the-art language modeling methods exhibit a lack of robustness on tasks like logical fallacy classification that require complex reasoning. In this paper, we propose a Case-Based Reasoning method that classifies new cases of logical fallacy by language-modeling-driven retrieval and adaptation of historical cases. We design four complementary strategies to enrich input representation for our model, based on external information about goals, explanations, counterarguments, and argument structure. Our experiments in in-domain and out-of-domain settings indicate that Case-Based Reasoning improves the accuracy and generalizability of language models. Our ablation studies suggest that representations of similar cases have a strong impact on the model performance, that models perform well with fewer retrieved cases, and that the size of the case database has a negligible effect on the performance. Finally, we dive deeper into the relationship between the properties of the retrieved cases and the model performance.

argument, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2301.11879

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > Switzerland (0.04)
North America > United States > New York (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.87)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

THUIR@COLIEE 2023: Incorporating Structural Knowledge into Pre-trained Language Models for Legal Case Retrieval

Li, Haitao, Su, Weihang, Wang, Changyue, Wu, Yueyue, Ai, Qingyao, Liu, Yiqun

arXiv.org Artificial IntelligenceMay-11-2023

Legal case retrieval techniques play an essential role in modern intelligent legal systems. As an annually well-known international competition, COLIEE is aiming to achieve the state-of-the-art retrieval model for legal texts. This paper summarizes the approach of the championship team THUIR in COLIEE 2023. To be specific, we design structure-aware pre-trained language models to enhance the understanding of legal cases. Furthermore, we propose heuristic pre-processing and post-processing approaches to reduce the influence of irrelevant messages. In the end, learning-to-rank methods are employed to merge features with different dimensions. Experimental results demonstrate the superiority of our proposal. Official results show that our run has the best performance among all submissions. The implementation of our method can be found at https://github.com/CSHaitao/THUIR-COLIEE2023.

coliee 2023, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2305.06812

Country:

North America > Canada (0.14)
Europe > Portugal > Braga > Braga (0.05)
Asia > China > Beijing > Beijing (0.05)
(3 more...)

Genre: Research Report > New Finding (0.54)

Industry:

Law (1.00)
Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.65)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

IBM's Watson returns as an AI development studio

EngadgetMay-9-2023, 19:57:17 GMT

Years before everyone was being impressed with the human-like text output of ChatGPT and other generative AI systems, IBM's Watson was blowing our minds on Jeopardy. IBM's cognitive computing project famously dominated its human opponents, but the company had much larger long-term goals, such as using Watson's ability to simulate a human thought process to help doctors diagnose patients and recommend treatments. Now, IBM is pivoting its supercomputer platform into Watsonx, an AI development studio packed with foundation and open-source models companies can use to train their own AI platforms. If that sounds familiar, it may be because NVIDIA recently announced a similar service with its AI Foundations program. Both platforms are designed to give enterprises a way to build, train, scale and deploy an AI platform.

ai development studio, ibm, platform, (5 more...)

Engadget

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Case Based Reasoning (0.40)

Add feedback

PreCog: Exploring the Relation between Memorization and Performance in Pre-trained Language Models

Ranaldi, Leonardo, Ruzzetti, Elena Sofia, Zanzotto, Fabio Massimo

arXiv.org Artificial IntelligenceMay-9-2023

Pre-trained Language Models such as BERT are impressive machines with the ability to memorize, possibly generalized learning examples. We present here a small, focused contribution to the analysis of the interplay between memorization and performance of BERT in downstream tasks. We propose PreCog, a measure for evaluating memorization from pre-training, and we analyze its correlation with the BERT's performance. Our experiments show that highly memorized examples are better classified, suggesting memorization is an essential key to success for BERT.

artificial intelligence, computational linguistic, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2305.04673

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(7 more...)

Genre: Research Report (0.83)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding

Ma, Yixiao, Wu, Yueyue, Su, Weihang, Ai, Qingyao, Liu, Yiqun

arXiv.org Artificial IntelligenceMay-9-2023

Legal case retrieval is a critical process for modern legal information systems. While recent studies have utilized pre-trained language models (PLMs) based on the general domain self-supervised pre-training paradigm to build models for legal case retrieval, there are limitations in using general domain PLMs as backbones. Specifically, these models may not fully capture the underlying legal features in legal case documents. To address this issue, we propose CaseEncoder, a legal document encoder that leverages fine-grained legal knowledge in both the data sampling and pre-training phases. In the data sampling phase, we enhance the quality of the training data by utilizing fine-grained law article information to guide the selection of positive and negative examples. In the pre-training phase, we design legal-specific pre-training tasks that align with the judging criteria of relevant legal cases. Based on these tasks, we introduce an innovative loss function called Biased Circle Loss to enhance the model's ability to recognize case relevance in fine grains. Experimental results on multiple benchmarks demonstrate that CaseEncoder significantly outperforms both existing general pre-training models and legal-specific pre-training models in zero-shot legal case retrieval.

artificial intelligence, caseencoder, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2305.05393

Country:

Asia > China > Beijing > Beijing (0.05)
North America > United States > District of Columbia > Washington (0.05)

Genre: Research Report (1.00)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.96)

Add feedback

When a CBR in Hand is Better than Twins in the Bush

Ahmed, Mobyen Uddin, Barua, Shaibal, Begum, Shahina, Islam, Mir Riyanul, Weber, Rosina O

arXiv.org Artificial IntelligenceMay-8-2023

AI methods referred to as interpretable are often discredited as inaccurate by supporters of the existence of a trade-off between interpretability and accuracy. In many problem contexts however this trade-off does not hold. This paper discusses a regression problem context to predict flight take-off delays where the most accurate data regression model was trained via the XGBoost implementation of gradient boosted decision trees. While building an XGB-CBR Twin and converting the XGBoost feature importance into global weights in the CBR model, the resultant CBR model alone provides the most accurate local prediction, maintains the global importance to provide a global explanation of the model, and offers the most interpretable representation for local explanations. This resultant CBR model becomes a benchmark of accuracy and interpretability for this problem context, and hence it is used to evaluate the two additive feature attribute methods SHAP and LIME to explain the XGBoost regression model.

artificial intelligence, machine learning, prediction, (15 more...)

arXiv.org Artificial Intelligence

2305.05111

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Transportation > Air (1.00)
Transportation > Infrastructure & Services (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.56)

Add feedback

Collaborative Learning in General Graphs with Limited Memorization: Complexity, Learnability, and Reliability

Li, Feng, Yuan, Xuyang, Wang, Lina, Yang, Huan, Yu, Dongxiao, Lv, Weifeng, Cheng, Xiuzhen

arXiv.org Artificial IntelligenceMay-6-2023

We consider a K-armed bandit problem in general graphs where agents are arbitrarily connected and each of them has limited memorizing capabilities and communication bandwidth. The goal is to let each of the agents eventually learn the best arm. It is assumed in these studies that the communication graph should be complete or well-structured, whereas such an assumption is not always valid in practice. Furthermore, limited memorization and communication bandwidth also restrict the collaborations of the agents, since the agents memorize and communicate very few experiences. Additionally, an agent may be corrupted to share falsified experiences to its peers, while the resource limit in terms of memorization and communication may considerably restrict the reliability of the learning process. To address the above issues, we propose a three-staged collaborative learning algorithm. In each step, the agents share their latest experiences with each other through light-weight random walks in a general communication graph, and then make decisions on which arms to pull according to the recommendations received from their peers. The agents finally update their adoptions (i.e., preferences to the arms) based on the reward obtained by pulling the arms. Our theoretical analysis shows that, when there are a sufficient number of agents participating in the collaborative learning process, all the agents eventually learn the best arm with high probability, even with limited memorizing capabilities and light-weight communications. We also reveal in our theoretical analysis the upper bound on the number of corrupted agents our algorithm can tolerate. The efficacy of our proposed three-staged collaborative learning algorithm is finally verified by extensive experiments on both synthetic and real datasets.

agent, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2201.12482

Country:

North America > United States (0.14)
Asia > China > Shandong Province > Qingdao (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.80)

Add feedback