AITopics | lazy training

Collaborating Authors

lazy training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ae614c557843b1df326cb29c57225459-Paper.pdf

Neural Information Processing SystemsFeb-13-2026, 14:36:03 GMT

In this work, we showthat this "lazy training" phenomenon isnot specific tooverparameterized neural networks, and is due to a choice of scaling, often implicit, that makes the model behave as its linearization around the initialization, thus yielding amodel equivalenttolearning withpositive-definite kernels.

artificial intelligence, machine learning, neural network, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.30)

Add feedback

Response to reviewers for the paper: " On Lazy Training in Differentiable Programming "

Neural Information Processing SystemsFeb-13-2026, 14:35:49 GMT

We thank the reviewers for their comments and suggestions. Hereafter, we list reviewers' (sometimes paraphrased) Each answer will translate into a clarification in the final version. Reviewer #2 and #3 felt that our message was lacking clarity. A.2). We will add more pointers to their statistical analysis, from the existing literature (e.g. L81-90 in the main paper, often α(m) = 1/ m in these works).

artificial intelligence, machine learning, reviewer, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

0ee8b85a85a49346fdff9665312a5cc4-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-11-2026, 11:45:19 GMT

laziest model, lazy training, training error, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.54)

Add feedback

0d1a9651497a38d8b1c3871c84528bd4-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-11-2026, 10:56:47 GMT

architecture, kernel, revision, (12 more...)

Neural Information Processing Systems

Genre: Research Report (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

Add feedback

SubquadraticOverparameterizationfor ShallowNeuralNetworks

Neural Information Processing SystemsFeb-8-2026, 21:59:09 GMT

Onone hand, it is widely accepted that, for two-layer neural networks, the number of parameters should growlinearlywith n(e.g.,[21,38]).

artificial intelligence, machine learning, neural network, (17 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Appendix PotentialNegativeSocietalImpacts

Neural Information Processing SystemsFeb-8-2026, 13:53:51 GMT

In this paper, we discuss the expressivity and trainability of narrow neural networks. Appendix H introduces the following contents.

artificial intelligence, machine learning, training regime, (18 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Limitations of Lazy Training of Two-layers Neural Network

Neural Information Processing SystemsDec-25-2025, 22:48:08 GMT

We study the supervised learning problem under either of the following two models: (1) Feature vectors x i = f i) for f i are distributed as a mixture of two d-dimensional centered Gaussians, and y_i's are the corresponding class labels. We use two-layers neural networks with quadratic activations, and compare three different learning regimes: the random features (RF) regime in which we only train the second-layer weights; the neural tangent (NT) regime in which we train a linearization of the neural network around its initialization; the fully trained neural network (NN) regime in which we train all the weights in the network. We prove that, even for the simple quadratic model of point (1), there is a potentially unbounded gap between the prediction risk achieved in these three training regimes, when the number of neurons is smaller than the ambient dimension. When the number of neurons is larger than the number of dimensions, the problem is significantly easier and both NT and NN learning achieve zero risk.

lazy training, name change, two-layer neural network, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.78)

Add feedback

Beyond Lazy Training for Over-parameterized Tensor Decomposition

Neural Information Processing SystemsDec-24-2025, 22:08:24 GMT

Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decomposition problem: given an $l$-th order tensor in $(R^d)^{\otimes l}$ of rank $r$ (where $r\ll d$), can variants of gradient descent find a rank $m$ decomposition where $m > r$? We show that in a lazy training regime (similar to the NTK regime for neural networks) one needs at least $m = \Omega(d^{l-1})$, while a variant of gradient descent can find an approximate tensor when $m = O^*(r^{2.5l}\log

lazy training, name change, over-parameterized tensor decomposition, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.61)

Add feedback

Evolution of Neural Tangent Kernels under Benign and Adversarial Training

Neural Information Processing SystemsDec-24-2025, 04:16:27 GMT

Two key challenges facing modern deep learning is mitigating deep networks vulnerability to adversarial attacks, and understanding deep learning's generalization capabilities. Towards the first issue, many defense strategies have been developed, with the most common being Adversarial Training (AT). Towards the second challenge, one of the dominant theories that has emerged is the Neural Tangent Kernel (NTK) -- a characterization of neural network behavior in the infinite-width limit. In this limit, the kernel is frozen and the underlying feature map is fixed. In finite-widths however, there is evidence that feature learning happens at the earlier stages of the training (kernel learning) before a second phase where the kernel remains fixed (lazy training).

adversarial training, benign and adversarial training, neural tangent kernel, (9 more...)

Neural Information Processing Systems

Industry: Information Technology (0.59)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Adversarial Robustness is at Odds with Lazy Training

Neural Information Processing SystemsDec-23-2025, 22:58:43 GMT

Recent works show that adversarial examples exist for random neural networks [Daniely and Schacham, 2020] and that these examples can be found using a single step of gradient ascent [Bubeck et al., 2021]. In this work, we extend this line of work to ``lazy training'' of neural networks -- a dominant model in deep learning theory in which neural networks are provably efficiently learnable. We show that over-parametrized neural networks that are guaranteed to generalize well and enjoy strong computational guarantees remain vulnerable to attacks generated using a single step of gradient ascent.

adversarial robustness, name change, neural network, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback