AITopics | overparametrized neural network

Epistemic uncertainty is often viewed as a reducible uncertainty that vanishes with increasing data. This perspective implicitly assumes parameter identifiability and equates epistemic uncertainty with predictive variability. In overparametrized neural networks, however, model parameters are typically non-identifiable due to symmetries and redundant representations. As a consequence, substantial parameter uncertainty can persist even when the underlying function is fully identified. In this work, we analyze epistemic uncertainty through the lens of non-identifiability and characterize both discrete and continuous sources of residual uncertainty. Focusing on one-hidden-layer ReLU networks, we thoroughly analyze the resulting posterior structure and validate our theoretical insights through empirical studies.

artificial intelligence, epistemic uncertainty, machine learning, (15 more...)

arXiv.org Machine Learning

2605.25234

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)

Add feedback

accordingly to incorporate the comments. Reviewer # 1: (Stepsize and preset T.) Following the current analysis, for a general stepsize η

Neural Information Processing SystemsFeb-13-2026, 02:17:47 GMT

We appreciate the valuable comments and positive feedback from the reviewers. Without averaging the iterates, no convergence rate is available. In this paper we consider neural network with one hidden layer. In particular, Proposition 4.7 shows that neural TD attains the global minimum of MSBE (without the We will revise the "without loss of generality" claim in the revision. We will clarify this notation in the revision.

artificial intelligence, machine learning, neural network, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Convergence of Adversarial Training in Overparametrized Neural Networks

Neural Information Processing SystemsDec-25-2025, 05:41:21 GMT

Neural networks are vulnerable to adversarial examples, i.e. inputs that are imperceptibly perturbed from natural data and yet incorrectly classified by the network. Adversarial training \cite{madry2017towards}, a heuristic form of robust optimization that alternates between minimization and maximization steps, has proven to be among the most successful methods to train networks to be robust against a pre-defined family of perturbations. This paper provides a partial answer to the success of adversarial training, by showing that it converges to a network where the surrogate loss with respect to the the attack algorithm is within $\epsilon$ of the optimal robust loss. Then we show that the optimal robust loss is also close to zero, hence adversarial training finds a robust classifier. The analysis technique leverages recent work on the analysis of neural networks via Neural Tangent Kernel (NTK), combined with motivation from online-learning when the maximization is solved by a heuristic, and the expressiveness of the NTK kernel in the $\ell_\infty$-norm. In addition, we also prove that robust interpolation requires more model capacity, supporting the evidence that adversarial training requires wider networks.

adversarial training, convergence, overparametrized neural network, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.54)

Add feedback

accordingly to incorporate the comments

Neural Information Processing SystemsOct-3-2025, 06:52:55 GMT

We appreciate the valuable comments and positive feedback from the reviewers. Without averaging the iterates, no convergence rate is available. In particular, Proposition 4.7 shows that neural TD attains the global minimum of MSBE (without the We will revise the "without loss of generality" claim in the revision. We will clarify this notation in the revision. We will fix them in the revision.

convergence rate, neural network, revision, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Reviews: Convergence of Adversarial Training in Overparametrized Neural Networks

Neural Information Processing SystemsMay-31-2025, 13:08:48 GMT

EDIT: I have read the author feedback and the authors have agreed to revise the writing. This is clearly a good paper that should be accepted. Two more comments regarding the rebuttal: (1) My original comments apply to natural training as well, and I understand this is a very challenging topic. For example, one such parameter is width. As far as I know, this is also first such results for adversarial training.

adversarial training, dependence, overparametrized neural network, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Convergence of Adversarial Training in Overparametrized Neural Networks

Neural Information Processing SystemsMay-27-2025, 09:39:14 GMT

Neural networks are vulnerable to adversarial examples, i.e. inputs that are imperceptibly perturbed from natural data and yet incorrectly classified by the network. Adversarial training \cite{madry2017towards}, a heuristic form of robust optimization that alternates between minimization and maximization steps, has proven to be among the most successful methods to train networks to be robust against a pre-defined family of perturbations. This paper provides a partial answer to the success of adversarial training, by showing that it converges to a network where the surrogate loss with respect to the the attack algorithm is within \epsilon of the optimal robust loss. Then we show that the optimal robust loss is also close to zero, hence adversarial training finds a robust classifier. The analysis technique leverages recent work on the analysis of neural networks via Neural Tangent Kernel (NTK), combined with motivation from online-learning when the maximization is solved by a heuristic, and the expressiveness of the NTK kernel in the \ell_\infty -norm.

adversarial training, convergence, overparametrized neural network, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)

Add feedback

Symmetries in Overparametrized Neural Networks: A Mean Field View

Neural Information Processing SystemsMay-27-2025, 07:28:09 GMT

We develop a Mean-Field (MF) view of the learning dynamics of overparametrized Artificial Neural Networks (NN) under distributional symmetries of the data w.r.t. the action of a general compact group G . We consider for this a class of generalized shallow NNs given by an ensemble of N multi-layer units, jointly trained using stochastic gradient descent (SGD) and possibly symmetry-leveraging (SL) techniques, such as Data Augmentation (DA), Feature Averaging (FA) or Equivariant Architectures (EA). We introduce the notions of weakly and strongly invariant laws (WI and SI) on the parameter space of each single unit, corresponding, respectively, to G -invariant distributions, and to distributions supported on parameters fixed by the group action (which encode EA). This allows us to define symmetric models compatible with taking N\to\infty and give an interpretation of the asymptotic dynamics of DA, FA and EA in terms of Wasserstein Gradient Flows describing their MF limits. When activations respect the group action, we show that, for symmetric data, DA, FA and freely-trained models obey the exact same MF dynamic, which stays in the space of WI parameter laws and attains therein the population risk's minimizer.

mean field view, overparametrized neural network, symmetry, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)

Add feedback

Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks

Neural Information Processing SystemsMay-27-2025, 06:56:42 GMT

Neural network compression techniques have become increasingly popular as they can drastically reduce the storage and computation requirements for very large networks. Recent empirical studies have illustrated that even simple pruning strategies can be surprisingly effective, and several theoretical studies have shown that compressible networks (in specific senses) should achieve a low generalization error. Yet, a theoretical characterization of the underlying causes that make the networks amenable to such simple compression schemes is still missing. In this study, focusing our attention on stochastic gradient descent (SGD), our main contribution is to link compressibility to two recently established properties of SGD: (i) as the network size goes to infinity, the system can converge to a mean-field limit, where the network weights behave independently [DBDFŞ20], (ii) for a large step-size/batch-size ratio, the SGD iterates can converge to a heavy-tailed stationary distribution [HM20, GŞZ21]. Assuming that both of these phenomena occur simultaneously, we prove that the networks are guaranteed to be ' \ell_p -compressible', and the compression errors of different pruning techniques (magnitude, singular value, or node pruning) become arbitrarily small as the network size increases. We further prove generalization bounds adapted to our theoretical framework, which are consistent with the observation that the generalization error will be lower for more compressible networks.

artificial intelligence, machine learning, overparametrized neural network, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Reviews: Convergence of Adversarial Training in Overparametrized Neural Networks

Neural Information Processing SystemsJan-22-2025, 19:43:47 GMT

The reviewers agreed the contributions made in this submission are significant and they all recommended acceptance.

artificial intelligence, machine learning, overparametrized neural network, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks

Neural Information Processing SystemsJan-19-2025, 14:06:28 GMT

Neural network compression techniques have become increasingly popular as they can drastically reduce the storage and computation requirements for very large networks. Recent empirical studies have illustrated that even simple pruning strategies can be surprisingly effective, and several theoretical studies have shown that compressible networks (in specific senses) should achieve a low generalization error. Yet, a theoretical characterization of the underlying causes that make the networks amenable to such simple compression schemes is still missing. In this study, focusing our attention on stochastic gradient descent (SGD), our main contribution is to link compressibility to two recently established properties of SGD: (i) as the network size goes to infinity, the system can converge to a mean-field limit, where the network weights behave independently [DBDFŞ20], (ii) for a large step-size/batch-size ratio, the SGD iterates can converge to a heavy-tailed stationary distribution [HM20, GŞZ21]. Assuming that both of these phenomena occur simultaneously, we prove that the networks are guaranteed to be ' \ell_p -compressible', and the compression errors of different pruning techniques (magnitude, singular value, or node pruning) become arbitrarily small as the network size increases. We further prove generalization bounds adapted to our theoretical framework, which are consistent with the observation that the generalization error will be lower for more compressible networks.

heavy tail, overparametrized neural network, sgd and compressibility, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.61)

Add feedback

Filters

Collaborating Authors

overparametrized neural network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

On the Epistemic Uncertainty of Overparametrized Neural Networks

accordingly to incorporate the comments. Reviewer # 1: (Stepsize and preset T.) Following the current analysis, for a general stepsize η

Convergence of Adversarial Training in Overparametrized Neural Networks

accordingly to incorporate the comments

Reviews: Convergence of Adversarial Training in Overparametrized Neural Networks

Convergence of Adversarial Training in Overparametrized Neural Networks

Symmetries in Overparametrized Neural Networks: A Mean Field View

Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks

Reviews: Convergence of Adversarial Training in Overparametrized Neural Networks

Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks