AITopics | Rote Learning

Collaborating Authors

Rote Learning

News Overviews Instructional Materials AI-Alerts Classics

Counterfactual Memorization in Neural Language Models

Zhang, Chiyuan, Ippolito, Daphne, Lee, Katherine, Jagielski, Matthew, Tramèr, Florian, Carlini, Nicholas

arXiv.org Artificial IntelligenceDec-23-2021

Modern neural language models widely used in tasks across NLP risk memorizing sensitive information from their training data. As models continue to scale up in parameters, training data, and compute, understanding memorization in language models is both important from a learning-theoretical point of view, and is practically crucial in real world applications. An open question in previous studies of memorization in language models is how to filter out "common" memorization. In fact, most memorization criteria strongly correlate with the number of occurrences in the training set, capturing "common" memorization such as familiar phrases, public knowledge or templated texts. In this paper, we provide a principled perspective inspired by a taxonomy of human memory in Psychology. From this perspective, we formulate a notion of counterfactual memorization, which characterizes how a model's predictions change if a particular document is omitted during training. We identify and study counterfactually-memorized training examples in standard text datasets. We further estimate the influence of each training example on the validation set and on generated texts, and show that this can provide direct evidence of the source of memorization at test time.

artificial intelligence, consumer health, machine learning, (30 more...)

arXiv.org Artificial Intelligence

2112.12938

Country:

Europe > Moldova (1.00)
Asia > Middle East > Israel (0.68)
Atlantic Ocean (0.45)
(22 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Leisure & Entertainment > Sports > Hockey (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Personalized Federated Learning through Local Memorization

Marfoq, Othmane, Neglia, Giovanni, Kameni, Laetitia, Vidal, Richard

arXiv.org Machine LearningNov-17-2021

Federated learning allows clients to collaboratively learn statistical models while keeping their data local. Federated learning was originally used to train a unique global model to be served to all clients, but this approach might be sub-optimal when clients' local data distributions are heterogeneous. In order to tackle this limitation, recent personalized federated learning methods train a separate model for each client while still leveraging the knowledge available at other clients. In this work, we exploit the ability of deep neural networks to extract high quality vectorial representations (embeddings) from non-tabular data, e.g., images and text, to propose a personalization mechanism based on local memorization. Personalization is obtained interpolating a pre-trained global model with a $k$-nearest neighbors (kNN) model based on the shared representation provided by the global model. We provide generalization bounds for the proposed approach and we show on a suite of federated datasets that this approach achieves significantly higher accuracy and fairness than state-of-the-art methods.

dataset, global model, representation, (14 more...)

arXiv.org Machine Learning

2111.0936

Country:

North America > United States > Virginia (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > France > Provence-Alpes-Côte d'Azur (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

On the Optimal Memorization Power of ReLU Neural Networks

Vardi, Gal, Yehudai, Gilad, Shamir, Ohad

arXiv.org Machine LearningOct-7-2021

We study the memorization power of feedforward ReLU neural networks. We show that such networks can memorize any $N$ points that satisfy a mild separability assumption using $\tilde{O}\left(\sqrt{N}\right)$ parameters. Known VC-dimension upper bounds imply that memorizing $N$ samples requires $\Omega(\sqrt{N})$ parameters, and hence our construction is optimal up to logarithmic factors. We also give a generalized construction for networks with depth bounded by $1 \leq L \leq \sqrt{N}$, for memorizing $N$ samples using $\tilde{O}(N/L)$ parameters. This bound is also optimal up to logarithmic factors. Our construction uses weights with large bit complexity. We prove that having such a large bit complexity is both necessary and sufficient for memorization with a sub-linear number of parameters.

bit complexity, construction, neural network, (12 more...)

arXiv.org Machine Learning

2110.03187

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.83)

Add feedback

Memorization in Deep Neural Networks: Does the Loss Function matter?

Patel, Deep, Sastry, P. S.

arXiv.org Machine LearningJul-22-2021

Deep Neural Networks, often owing to the overparameterization, are shown to be capable of exactly memorizing even randomly labelled data. Empirical studies have also shown that none of the standard regularization techniques mitigate such overfitting. We investigate whether the choice of the loss function can affect this memorization. We empirically show, with benchmark data sets MNIST and CIFAR-10, that a symmetric loss function, as opposed to either cross-entropy or squared error loss, results in significant improvement in the ability of the network to resist such overfitting. We then provide a formal definition for robustness to memorization and provide a theoretical explanation as to why the symmetric losses provide this robustness. Our results clearly bring out the role loss functions alone can play in this phenomenon of memorization.

accuracy, loss function, memorization, (13 more...)

arXiv.org Machine Learning

2107.09957

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)
Asia > India > NCT > New Delhi (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

On the Memorization Properties of Contrastive Learning

Sadrtdinov, Ildus, Chirkova, Nadezhda, Lobacheva, Ekaterina

arXiv.org Machine LearningJul-21-2021

Memorization studies of deep neural networks (DNNs) help to understand what patterns and how do DNNs learn, and motivate improvements to DNN training approaches. In this work, we investigate the memorization properties of SimCLR, a widely used contrastive self-supervised learning approach, and compare them to the memorization of supervised learning and random labels training. We find that both training objects and augmentations may have different complexity in the sense of how SimCLR learns them. Moreover, we show that SimCLR is similar to random labels training in terms of the distribution of training objects complexity.

augmentation, epoch, simclr, (14 more...)

arXiv.org Machine Learning

2107.10143

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > France (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)

Add feedback

Quran Memorization Course. A Proven System To Do It Easy NOW

#artificialintelligenceJun-29-2021, 19:48:25 GMT

In this Course you will learn and gain 6 new habits. Each habit will make big change in your Memorization Ability. Many people who have taken this course before were able to memorize the whole holy Quran short Time. Even some of them were able to memorize the whole Quran in short Time. This course helped myself and when I noticed the amazing results, I have decided to do this course publicly to help million of Muslims around the world.

memorize, proven system, quran memorization course, (1 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.75)

Add feedback

An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks

Rajput, Shashank, Sreenivasan, Kartik, Papailiopoulos, Dimitris, Karbasi, Amin

arXiv.org Machine LearningJun-14-2021

It is well known that modern deep neural networks are powerful enough to memorize datasets even when the labels have been randomized. Recently, Vershynin (2020) settled a long standing question by Baum (1988), proving that \emph{deep threshold} networks can memorize $n$ points in $d$ dimensions using $\widetilde{\mathcal{O}}(e^{1/\delta^2}+\sqrt{n})$ neurons and $\widetilde{\mathcal{O}}(e^{1/\delta^2}(d+\sqrt{n})+n)$ weights, where $\delta$ is the minimum distance between the points. In this work, we improve the dependence on $\delta$ from exponential to almost linear, proving that $\widetilde{\mathcal{O}}(\frac{1}{\delta}+\sqrt{n})$ neurons and $\widetilde{\mathcal{O}}(\frac{d}{\delta}+n)$ weights are sufficient. Our construction uses Gaussian random weights only in the first layer, while all the subsequent layers use binary or integer weights. We also prove new lower bounds by connecting memorization in neural networks to the purely geometric problem of separating $n$ points on a sphere using hyperplanes.

hyperplane, neuron, probability, (17 more...)

arXiv.org Machine Learning

2106.07724

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > France (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.61)

Add feedback

Rethink the Connections among Generalization, Memorization and the Spectral Bias of DNNs

Zhang, Xiao, Xiong, Haoyi, Wu, Dongrui

arXiv.org Machine LearningJun-5-2021

Over-parameterized deep neural networks (DNNs) with sufficient capacity to memorize random noise can achieve excellent generalization performance, challenging the bias-variance trade-off in classical learning theory. Recent studies claimed that DNNs first learn simple patterns and then memorize noise; some other works showed a phenomenon that DNNs have a spectral bias to learn target functions from low to high frequencies during training. However, we show that the monotonicity of the learning bias does not always hold: under the experimental setup of deep double descent, the high-frequency components of DNNs diminish in the late stage of training, leading to the second descent of the test error. Besides, we find that the spectrum of DNNs can be applied to indicating the second descent of the test error, even though it is calculated from the training set only.

descent, second descent, test error, (15 more...)

arXiv.org Machine Learning

2004.13954

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(8 more...)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.42)

Add feedback

Fundamental tradeoffs between memorization and robustness in random features and neural tangent regimes

Dohmatob, Elvis

arXiv.org Machine LearningJun-4-2021

This work studies the (non)robustness of two-layer neural networks in various high-dimensional linearized regimes. We establish fundamental trade-offs between memorization and robustness, as measured by the Sobolev-seminorm of the model w.r.t the data distribution, i.e the square root of the average squared $L_2$-norm of the gradients of the model w.r.t the its input. More precisely, if $n$ is the number of training examples, $d$ is the input dimension, and $k$ is the number of hidden neurons in a two-layer neural network, we prove for a large class of activation functions that, if the model memorizes even a fraction of the training, then its Sobolev-seminorm is lower-bounded by (i) $\sqrt{n}$ in case of infinite-width random features (RF) or neural tangent kernel (NTK) with $d \gtrsim n$; (ii) $\sqrt{n}$ in case of finite-width RF with proportionate scaling of $d$ and $k$; and (iii) $\sqrt{n/k}$ in case of finite-width NTK with proportionate scaling of $d$ and $k$. Moreover, all of these lower-bounds are tight: they are attained by the min-norm / least-squares interpolator (when $n$, $d$, and $k$ are in the appropriate interpolating regime). All our results hold as soon as data is log-concave isotropic, and there is label-noise, i.e the target variable is not a deterministic function of the data / features. We empirically validate our theoretical results with experiments. Accidentally, these experiments also reveal for the first time, (iv) a multiple-descent phenomenon in the robustness of the min-norm interpolator.

activation function, neural network, robustness, (14 more...)

arXiv.org Machine Learning

2106.0263

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.87)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.61)

Add feedback

Exploring Memorization in Adversarial Training

Dong, Yinpeng, Xu, Ke, Yang, Xiao, Pang, Tianyu, Deng, Zhijie, Su, Hang, Zhu, Jun

arXiv.org Machine LearningJun-3-2021

It is well known that deep learning models have a propensity for fitting the entire training set even with random labels, which requires memorization of every training sample. In this paper, we investigate the memorization effect in adversarial training (AT) for promoting a deeper understanding of capacity, convergence, generalization, and especially robust overfitting of adversarially trained classifiers. We first demonstrate that deep networks have sufficient capacity to memorize adversarial examples of training data with completely random labels, but not all AT algorithms can converge under the extreme circumstance. Our study of AT with random labels motivates further analyses on the convergence and generalization of AT. We find that some AT methods suffer from a gradient instability issue, and the recently suggested complexity measures cannot explain robust generalization by considering models trained on random labels. Furthermore, we identify a significant drawback of memorization in AT that it could result in robust overfitting. We then propose a new mitigation algorithm motivated by detailed memorization analyses. Extensive experiments on various datasets validate the effectiveness of the proposed method.

adversarial example, international conference, random label, (13 more...)

arXiv.org Machine Learning

2106.01606

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback