AITopics

2502.07516

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Wang, Wenhao, Dziedzic, Adam, Kim, Grace C., Backes, Michael, Boenisch, Franziska

Captured by Captions: On Memorization and its Mitigation in CLIP Models

arXiv.org Artificial IntelligenceFeb-10-2025

Multi-modal models, such as CLIP, have demonstrated strong performance in aligning visual and textual representations, excelling in tasks like image retrieval and zero-shot classification. Despite this success, the mechanisms by which these models utilize training data, particularly the role of memorization, remain unclear. In uni-modal models, both supervised and self-supervised, memorization has been shown to be essential for generalization. However, it is not well understood how these findings would apply to CLIP, which incorporates elements from both supervised learning via captions that provide a supervisory signal similar to labels, and from self-supervised learning via the contrastive objective. To bridge this gap in understanding, we propose a formal definition of memorization in CLIP (CLIPMem) and use it to quantify memorization in CLIP models. Our results indicate that CLIP's memorization behavior falls between the supervised and self-supervised paradigms, with "mis-captioned" samples exhibiting highest levels of memorization. Additionally, we find that the text encoder contributes more to memorization than the image encoder, suggesting that mitigation strategies should focus on the text domain. Building on these insights, we propose multiple strategies to reduce memorization while at the same time improving utility--something that had not been shown before for traditional learning paradigms where reducing memorization typically results in utility decrease.

artificial intelligence, machine learning, memorization, (17 more...)

2502.0783

Country:

Europe > Poland (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)
North America > United States > Wisconsin > Sauk County (0.04)
(5 more...)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Neural Information Processing SystemsFeb-9-2025, 00:08:51 GMT

The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification

Been Kim, Cynthia Rudin, Julie A. Shah

We present the Bayesian Case Model (BCM), a general framework for Bayesian case-based reasoning (CBR) and prototype classification and clustering. BCM brings the intuitive power of CBR to a Bayesian generative framework. The BCM learns prototypes, the "quintessential" observations that best represent clusters in a dataset, by performing joint inference on cluster labels, prototypes and important features. Simultaneously, BCM pursues sparsity by learning subspaces, the sets of features that play important roles in the characterization of the prototypes. The prototype and subspace representation provides quantitative benefits in interpretability while preserving classification accuracy. Human subject experiments verify statistically significant improvements to participants' understanding when using explanations produced by BCM, compared to those given by prior art.

artificial intelligence, machine learning, prototype, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Norway > Central Norway > Trøndelag > Trondheim (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.47)

Industry:

Media > Film (0.69)
Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Neural Information Processing SystemsFeb-7-2025, 18:17:43 GMT

Review for NeurIPS paper: Early-Learning Regularization Prevents Memorization of Noisy Labels

Weaknesses: I have many reservation against the claims of the paper. I would appreciate it if the authors can clarify some of these issues during their rebuttal. First, the proof of their main theorem about logistic regression has many issues. One key issue is that the authors make assumptions within the proof that are not clearly stated or justified upfront. For example, in Line 440 in the supplementary materials, the proof assumes that theta Tv .1.

early-learning regularization prevent memorization, neurips paper, noisy label, (4 more...)

Genre:

Research Report > New Finding (0.59)
Research Report > Experimental Study (0.43)

Industry: Education > Educational Setting > Preschool (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.42)

Neural Information Processing SystemsFeb-7-2025, 18:17:35 GMT

Review for NeurIPS paper: Early-Learning Regularization Prevents Memorization of Noisy Labels

The paper studies the following interesting phenomenon (observed in the previous literature): when trained on the dataset with incorrectly labeled points (i.e. "label noise"), DNNs first learn the benign ("correctly labeled") points and once this is done they start "memorizing" the noisy points. It was previously shown in the literature (empirically) that the second "memorization" phase hurts the generalization. The authors make 2 Contributions: (Contribution 1) They demonstrate (empirically and theoretically) that similar phenomenon can be observed in the simpler setting of the over-parametrized (dimensionality number of points) linear two-class logistic regression, when the class distributions are isotropic Gaussian with fixed means \pm mu and vanishing variance (see Theorem 1 and Figure A.1). (Contribution 2) Motivated by the theory of contribution 1, the authors propose a novel regularizer. When used in the vanilla DNN training with the cross-entropy loss, this regularizer successfully prevents the networks from falling to the "memorization phase" (as evidenced by Figure 1). All the reviewers agree that the topic and the focus of this paper is very timely.

contribution, early-learning regularization prevent memorization, sigma sqrt, (9 more...)

Industry: Education > Educational Setting > Preschool (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.84)

Bossy, Thierry, Vignoud, Julien, Rabbani, Tahseen, Pastoriza, Juan R. Troncoso, Jaggi, Martin

Mitigating Unintended Memorization with LoRA in Federated Learning for LLMs

arXiv.org Artificial IntelligenceFeb-7-2025

Federated learning (FL) is a popular paradigm for collaborative training which avoids direct data exposure between clients. However, data privacy issues still remain: FL-trained large language models are capable of memorizing and completing phrases and sentences contained in training data when given with their prefixes. Thus, it is possible for adversarial and honest-but-curious clients to recover training data of other participants simply through targeted prompting. In this work, we demonstrate that a popular and simple fine-tuning strategy, low-rank adaptation (LoRA), reduces memorization during FL up to a factor of 10. We study this effect by performing a medical question-answering fine-tuning task and injecting multiple replicas of out-of-distribution sensitive sequences drawn from an external clinical dataset. We observe a reduction in memorization for a wide variety of Llama 2 and 3 models, and find that LoRA can reduce memorization in centralized learning as well. Furthermore, we show that LoRA can be combined with other privacy-preserving techniques such as gradient clipping and Gaussian noising, secure aggregation, and Goldfish loss to further improve record-level privacy while maintaining performance.

large language model, machine learning, natural language, (14 more...)

2502.05087

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Curriculum > Subject-Specific Education (0.67)
Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

arXiv.org Machine LearningFeb-5-2025

Taking a Big Step: Large Learning Rates in Denoising Score Matching Prevent Memorization

Wu, Yu-Han, Marion, Pierre, Biau, Gérard, Boyer, Claire

Denoising score matching plays a pivotal role in the performance of diffusion-based generative models. However, the empirical optimal score--the exact solution to the denoising score matching--leads to memorization, where generated samples replicate the training data. Yet, in practice, only a moderate degree of memorization is observed, even without explicit regularization. In this paper, we investigate this phenomenon by uncovering an implicit regularization mechanism driven by large learning rates. Specifically, we show that in the small-noise regime, the empirical optimal score exhibits high irregularity. We then prove that, when trained by stochastic gradient descent with a large enough learning rate, neural networks cannot stably converge to a local minimum with arbitrarily small excess risk. Consequently, the learned score cannot be arbitrarily close to the empirical optimal score, thereby mitigating memorization. To make the analysis tractable, we consider one-dimensional data and two-layer neural networks. Experiments validate the crucial role of the learning rate in preventing memorization, even beyond the one-dimensional setting.

artificial intelligence, machine learning, memorization, (13 more...)

arXiv.org Machine Learning

2502.03435

Country:

Europe > France (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

arXiv.org Artificial IntelligenceFeb-3-2025

Skewed Memorization in Large Language Models: Quantification and Decomposition

Li, Hao, Huang, Di, Wang, Ziyu, Rahmani, Amir M.

Memorization in Large Language Models (LLMs) poses privacy and security risks, as models may unintentionally reproduce sensitive or copyrighted data. Existing analyses focus on average-case scenarios, often neglecting the highly skewed distribution of memorization. This paper examines memorization in LLM supervised fine-tuning (SFT), exploring its relationships with training duration, dataset size, and inter-sample similarity. By analyzing memorization probabilities over sequence lengths, we link this skewness to the token generation process, offering insights for estimating memorization and comparing it to established metrics. Through theoretical analysis and empirical evaluation, we provide a comprehensive understanding of memorization behaviors and propose strategies to detect and mitigate risks, contributing to more privacy-preserving LLMs.

large language model, machine learning, natural language, (17 more...)

2502.01187

Country: North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Dankers, Verna, Raunak, Vikas

Memorization Inheritance in Sequence-Level Knowledge Distillation for Neural Machine Translation

arXiv.org Artificial IntelligenceFeb-3-2025

In this work, we explore how instance-level memorization in the teacher Neural Machine Translation (NMT) model gets inherited by the student model in sequence-level knowledge distillation (SeqKD). We find that despite not directly seeing the original training data, students memorize more than baseline models (models of the same size, trained on the original data) -- 3.4% for exact matches and 57% for extractive memorization -- and show increased hallucination rates. Further, under this SeqKD setting, we also characterize how students behave on specific training data subgroups, such as subgroups with low quality and specific counterfactual memorization (CM) scores, and find that students exhibit amplified denoising on low-quality subgroups. Finally, we propose a modification to SeqKD named Adaptive-SeqKD, which intervenes in SeqKD to reduce memorization and hallucinations. Overall, we recommend caution when applying SeqKD: students inherit both their teachers' superior performance and their fault modes, thereby requiring active monitoring.

artificial intelligence, machine learning, natural language, (16 more...)

2502.01491

Country:

North America > Trinidad and Tobago (0.04)
North America > Canada (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry: Education (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Neural Information Processing SystemsJan-27-2025, 12:50:16 GMT

Reviews: Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity

The paper investigates the problem of expressiveness in neural networks w.r.t. The authors also show an upper bound for classification, a corollary of which is that a three hidden layer network with hidden layers of sized 2k-2k-4k can perfectly classify ImageNet. Moreover, they show that if the overall sum of hidden nodes in a ResNet is of order N/d_x, where d_x is the input dimension then again the network can perfectly realize the data. Lastly, an analysis is given showing batch SGD that is initialized close to a global minimum will come close to a point with value significantly smaller than the loss in the initialization (though a convergence guarantee could not be given). The paper is clear and easy to follow for the most part, and conveys a feeling that the authors did their best to make the analysis as thorough and exhausting as possible, providing results for various settings.

memorization capacity, powerful memorizer, small relu network, (2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.40)