AITopics | Rote Learning

Diffusion models have emerged as a powerful framework for generative modeling. At the heart of the methodology is score matching: learning gradients of families of log-densities for noisy versions of the data distribution at different scales. When the loss function adopted in score matching is evaluated using empirical data, rather than the population loss, the minimizer corresponds to the score of a time-dependent Gaussian mixture. However, use of this analytically tractable minimizer leads to data memorization: in both unconditioned and conditioned settings, the generative model returns the training samples. This paper contains an analysis of the dynamical mechanism underlying memorization. The analysis highlights the need for regularization to avoid reproducing the analytically tractable minimizer; and, in so doing, lays the foundations for a principled understanding of how to regularize. Numerical experiments investigate the properties of: (i) Tikhonov regularization; (ii) regularization designed to promote asymptotic consistency; and (iii) regularizations induced by under-parameterization of a neural network or by early stopping when training a neural network. These experiments are evaluated in the context of memorization, and directions for future development of regularization are highlighted.

artificial intelligence, machine learning, memorization and regularization, (16 more...)

arXiv.org Artificial Intelligence

2501.15785

Country: North America > United States > California (0.14)

Genre: Research Report (0.63)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (1.00)

Add feedback

Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization Simone Bombari

Neural Information Processing SystemsJan-25-2025, 22:30:14 GMT

The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at least a layer with Ω(N) neurons, N being the number of training samples. Furthermore, there is increasing evidence suggesting that deep networks with sublinear layer widths are powerful memorizers and optimizers, as long as the number of parameters exceeds the number of samples. Thus, a natural open question is whether the NTK is well conditioned in such a challenging sub-linear setup. In this paper, we answer this question in the affirmative. Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks with the minimum possible over-parameterization: up to logarithmic factors, the number of parameters is Ω(N) and, hence, the number of neurons is as little as Ω( N). To showcase the applicability of our NTK bounds, we provide two results concerning memorization capacity and optimization guarantees for gradient descent training.

artificial intelligence, machine learning, neural network, (15 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.82)

Add feedback

Review for NeurIPS paper: Neural Networks Learning and Memorization with (almost) no Over-Parameterization

Neural Information Processing SystemsJan-25-2025, 04:36:36 GMT

Weaknesses: One of my concerns is the rigorousness of the paper. A key lemma, namely Lemma 12 in the supplementary material is only given with a proof sketch. Moreover, in the proof sketch, how the authors handle the general M-decent activation functions is discussed very ambiguously. This makes the results for ReLU activation function particularly questionable. The significance and novelty of this paper compared with the existing results are also not fully demonstrated. It is claimed in this paper that a tight analysis is given on the convergence of NTK to its expectations.

neural network learning and memorization, neurips paper, over-parameterization, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.40)

Add feedback

Neural Networks Learning and Memorization with (almost) no Over-Parameterization Amit Daniely The Hebrew University and Google Research Tel-Aviv amit.daniely@mail.huji.ac.il

Neural Information Processing SystemsJan-25-2025, 04:36:34 GMT

Many results in recent years established polynomial time learnability of various models via neural networks algorithms (e.g.

artificial intelligence, machine learning, neural network, (15 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.42)

Add feedback

Review for NeurIPS paper: Neural Networks Learning and Memorization with (almost) no Over-Parameterization

Neural Information Processing SystemsJan-25-2025, 04:36:28 GMT

This paper studies optimization in the NTK regime, further improving the best prior width bounds for random data (I believe Oymak-Soltanolkotabi were the prior best). The reviewers and I were all favorable, and I look forward to seeing this paper appear, and support the authors in further investigations. Relatedly, this point was not sufficiently handled in the rebuttal, despite the rebuttal using less than half a page. Please consider such things in the future. One is by Roman Vershynin, and I believe Sebastien Bubeck and colleagues also had a paper on the "Baum" problem.

neural network learning and memorization, neurips paper, over-parameterization, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.40)

Add feedback

Decoding Generalization from Memorization in Deep Neural Networks

Ketha, Simran, Ramaswamy, Venkatakrishnan

arXiv.org Artificial IntelligenceJan-24-2025

Overparameterized Deep Neural Networks that generalize well have been key to the dramatic success of Deep Learning in recent years. The reasons for their remarkable ability to generalize are not well understood yet. It has also been known that deep networks possess the ability to memorize training data, as evidenced by perfect or high training accuracies on models trained with corrupted data that have class labels shuffled to varying degrees. Concomitantly, such models are known to generalize poorly, i.e. they suffer from poor test accuracies, due to which it is thought that the act of memorizing substantially degrades the ability to generalize. It has, however, been unclear why the poor generalization that accompanies such memorization, comes about. One possibility is that in the process of training with corrupted data, the layers of the network irretrievably reorganize their representations in a manner that makes generalization difficult. The other possibility is that the network retains significant ability to generalize, but the trained network somehow chooses to readout in a manner that is detrimental to generalization. Here, we provide evidence for the latter possibility by demonstrating, empirically, that such models possess information in their representations for substantially improved generalization, even in the face of memorization. Furthermore, such generalization abilities can be easily decoded from the internals of the trained model, and we build a technique to do so from the outputs of specific layers of the network. We demonstrate results on multiple models trained with a number of standard datasets.

accuracy, artificial intelligence, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2501.14687

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.81)

Add feedback

Network size and weights size for memorization with two layers neural networks

Neural Information Processing SystemsJan-23-2025, 08:30:16 GMT

In 1988, Eric B. Baum showed that two-layers neural networks with threshold activation function can perfectly memorize the binary labels of n points in general position in R

artificial intelligence, machine learning, neuron, (17 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.42)

Add feedback

Review for NeurIPS paper: Network size and size of the weights in memorization with two-layers neural networks

Neural Information Processing SystemsJan-23-2025, 08:30:10 GMT

The boosting part is not clear to me? What is the underlying game? 3, Notion of error: This work considers multiplicative error model to define memorization (for eg in Lemma 1).

memorization, network size and size, two-layer neural network, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback