AITopics | overparametrization

Collaborating Authors

overparametrization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the Interplay of Priors and Overparametrization in Bayesian Neural Network Posteriors

Kobialka, Julius, Sommer, Emanuel, Kolb, Chris, Kwon, Juntae, Dold, Daniel, Rügamer, David

arXiv.org Machine LearningMar-24-2026

Bayesian neural network (BNN) posteriors are often considered impractical for inference, as symmetries fragment them, non-identifiabilities inflate dimensionality, and weight-space priors are seen as meaningless. In this work, we study how overparametrization and priors together reshape BNN posteriors and derive implications allowing us to better understand their interplay. We show that redundancy introduces three key phenomena that fundamentally reshape the posterior geometry: balancedness, weight reallocation on equal-probability manifolds, and prior conformity. We validate our findings through extensive experiments with posterior sampling budgets that far exceed those of earlier works, and demonstrate how overparametrization induces structured, prior-aligned weight posterior distributions.

artificial intelligence, international conference, machine learning, (14 more...)

arXiv.org Machine Learning

2603.2203

Country:

North America > United States > California > Orange County > Irvine (0.04)
Africa > Middle East > Morocco > Tanger-Tetouan-Al Hoceima Region > Tangier (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Overparametrization bends the landscape: BBP transitions at initialization in simple Neural Networks

Annesi, Brandon Livio, Bocchi, Dario, Cammarota, Chiara

arXiv.org Machine LearningOct-22-2025

High-dimensional non-convex loss landscapes play a central role in the theory of Machine Learning. Gaining insight into how these landscapes interact with gradient-based optimization methods, even in relatively simple models, can shed light on this enigmatic feature of neural networks. In this work, we will focus on a prototypical simple learning problem, which generalizes the Phase Retrieval inference problem by allowing the exploration of overparametrized settings. Using techniques from field theory, we analyze the spectrum of the Hessian at initialization and identify a Baik-Ben Arous-Péché (BBP) transition in the amount of data that separates regimes where the initialization is informative or uninformative about a planted signal of a teacher-student setup. Crucially, we demonstrate how overparameterization can bend the loss landscape, shifting the transition point, even reaching the information-theoretic weak-recovery threshold in the large overparameterization limit, while also altering its qualitative nature. We distinguish between continuous and discontinuous BBP transitions and support our analytical predictions with simulations, examining how they compare to the finite-N behavior. In the case of discontinuous BBP transitions strong finite-N corrections allow the retrieval of information at a signal-to-noise ratio (SNR) smaller than the predicted BBP transition. In these cases we provide estimates for a new lower SNR threshold that marks the point at which initialization becomes entirely uninformative.

artificial intelligence, machine learning, transition, (18 more...)

arXiv.org Machine Learning

2510.18435

Country:

Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.24)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.50)

Industry: Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Online reinforcement learning via sparse Gaussian mixture model Q-functions

Vu, Minh, Slavakis, Konstantinos

arXiv.org Artificial IntelligenceSep-19-2025

This paper introduces a structured and interpretable online policy-iteration framework for reinforcement learning (RL), built around the novel class of sparse Gaussian mixture model Q-functions (S-GMM-QFs). Extending earlier work that trained GMM-QFs offline, the proposed framework develops an online scheme that leverages streaming data to encourage exploration. Model complexity is regulated through sparsification by Hadamard overparametrization, which mitigates overfitting while preserving expressiveness. The parameter space of S-GMM-QFs is naturally endowed with a Riemannian manifold structure, allowing for principled parameter updates via online gradient descent on a smooth objective. Numerical tests show that S-GMM-QFs match the performance of dense deep RL (DeepRL) methods on standard benchmarks while using significantly fewer parameters, and maintain strong performance even in low-parameter-count regimes where sparsified DeepRL methods fail to generalize.

machine learning, q-function, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2509.14585

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.68)

Genre:

Instructional Material > Online (0.40)
Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Author Response to Reviews

Neural Information Processing SystemsAug-22-2025, 02:23:50 GMT

Thank you for your time in reading the paper and the positive feedback! Below are responses for each reviewer. Thank you for your detailed reading of the paper and positive feedback! With even more runs, we expect this distinction will be even clearer. This is the core of the transfer learning question, and a central part of our paper.

author response, positive feedback, random initialization, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Benignity of loss landscape with weight decay requires both large overparametrization and initialization

Boursier, Etienne, Bowditch, Matthew, Englert, Matthias, Lazic, Ranko

arXiv.org Artificial IntelligenceMay-29-2025

The optimization of neural networks under weight decay remains poorly understood from a theoretical standpoint. While weight decay is standard practice in modern training procedures, most theoretical analyses focus on unregularized settings. In this work, we investigate the loss landscape of the $\ell_2$-regularized training loss for two-layer ReLU networks. We show that the landscape becomes benign -- i.e., free of spurious local minima -- under large overparametrization, specifically when the network width $m$ satisfies $m \gtrsim \min(n^d, 2^n)$, where $n$ is the number of data points and $d$ the input dimension. More precisely in this regime, almost all constant activation regions contain a global minimum and no spurious local minima. We further show that this level of overparametrization is not only sufficient but also necessary via the example of orthogonal data. Finally, we demonstrate that such loss landscape results primarily hold relevance in the large initialization regime. In contrast, for small initializations -- corresponding to the feature learning regime -- optimization can still converge to spurious local minima, despite the global benignity of the landscape.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2505.22578

Country: Europe (0.28)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Review for NeurIPS paper: Robust Recovery via Implicit Bias of Discrepant Learning Rates for Double Over-parameterization

Neural Information Processing SystemsFeb-6-2025, 09:27:12 GMT

Additional Feedback: To be honest, I find the term "double overparametrization" a bit strange. I would still call it simply "overparametrization". Perhaps, the authors could think about this point and potentially adjust. I would suggest that the authors briefly discuss the following point which is sometimes overlooked when discussing implicit bias of gradient descent in the context of low rank matrix recovery. When additional restricting to positive semidefinite matrices it turns out that the original low rank matrix is often the UNIQUE solution to the linear equation y A(X) that is positive semidefinite, see the paper "Implicit regularization and solution uniqueness in over-parameterized matrix sensing" by Geyer et al., arxiv:806.02046,

discrepant learning rate, double over-parameterization, robust recovery, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Recovery Guarantees of Unsupervised Neural Networks for Inverse Problems trained with Gradient Descent

Buskulic, Nathan, Fadili, Jalal, Quéau, Yvain

arXiv.org Artificial IntelligenceMar-8-2024

Advanced machine learning methods, and more prominently neural networks, have become standard to solve inverse problems over the last years. However, the theoretical recovery guarantees of such methods are still scarce and difficult to achieve. Only recently did unsupervised methods such as Deep Image Prior (DIP) get equipped with convergence and recovery guarantees for generic loss functions when trained through gradient flow with an appropriate initialization. In this paper, we extend these results by proving that these guarantees hold true when using gradient descent with an appropriately chosen step-size/learning rate. We also show that the discretization only affects the overparametrization bound for a two-layer DIP network by a constant and thus that the different guarantees found for the gradient flow will hold for gradient descent.

artificial intelligence, converge, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2403.05395

Country:

Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
Europe > France (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.93)

Add feedback

The Lattice Overparametrization Paradigm for the Machine Learning of Lattice Operators

Marcondes, Diego, Barrera, Junior

arXiv.org Artificial IntelligenceJan-26-2024

The machine learning of lattice operators has three possible bottlenecks. From a statistical standpoint, it is necessary to design a constrained class of operators based on prior information with low bias, and low complexity relative to the sample size. From a computational perspective, there should be an efficient algorithm to minimize an empirical error over the class. From an understanding point of view, the properties of the learned operator need to be derived, so its behavior can be theoretically understood. The statistical bottleneck can be overcome due to the rich literature about the representation of lattice operators, but there is no general learning algorithm for them. In this paper, we discuss a learning paradigm in which, by overparametrizing a class via elements in a lattice, an algorithm for minimizing functions in a lattice is applied to learn. We present the stochastic lattice descent algorithm as a general algorithm to learn on constrained classes of operators as long as a lattice overparametrization of it is fixed, and we discuss previous works which are proves of concept. Moreover, if there are algorithms to compute the basis of an operator from its overparametrization, then its properties can be deduced and the understanding bottleneck is also overcome. This learning paradigm has three properties that modern methods based on neural networks lack: control, transparency and interpretability. Nowadays, there is an increasing demand for methods with these characteristics, and we believe that mathematical morphology is in a unique position to supply them. The lattice overparametrization paradigm could be a missing piece for it to achieve its full potential within modern machine learning.

algorithm, operator, overparametrization, (15 more...)

arXiv.org Artificial Intelligence

2310.06639

Country:

South America > Brazil > São Paulo (0.05)
North America > United States > Texas (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)

Add feedback

Convergence and Recovery Guarantees of Unsupervised Neural Networks for Inverse Problems

Buskulic, Nathan, Fadili, Jalal, Quéau, Yvain

arXiv.org Artificial IntelligenceOct-17-2023

Neural networks have become a prominent approach to solve inverse problems in recent years. While a plethora of such methods was developed to solve inverse problems empirically, we are still lacking clear theoretical guarantees for these methods. On the other hand, many works proved convergence to optimal solutions of neural networks in a more general setting using overparametrization as a way to control the Neural Tangent Kernel. In this work we investigate how to bridge these two worlds and we provide deterministic convergence and recovery guarantees for the class of unsupervised feedforward multilayer neural networks trained to solve inverse problems. We also derive overparametrization bounds under which a two-layers Deep Inverse Prior network with smooth activation function will benefit from our guarantees.

inequality, neural network, overparametrization, (16 more...)

arXiv.org Artificial Intelligence

2309.12128

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Law of Robustness beyond Isoperimetry

Wu, Yihan, Huang, Heng, Zhang, Hongyang

arXiv.org Artificial IntelligenceMay-31-2023

We study the robust interpolation problem of arbitrary data distributions supported on a bounded space and propose a two-fold law of robustness. Robust interpolation refers to the problem of interpolating $n$ noisy training data points in $\mathbb{R}^d$ by a Lipschitz function. Although this problem has been well understood when the samples are drawn from an isoperimetry distribution, much remains unknown concerning its performance under generic or even the worst-case distributions. We prove a Lipschitzness lower bound $\Omega(\sqrt{n/p})$ of the interpolating neural network with $p$ parameters on arbitrary data distributions. With this result, we validate the law of robustness conjecture in prior work by Bubeck, Li, and Nagaraj on two-layer neural networks with polynomial weights. We then extend our result to arbitrary interpolating approximators and prove a Lipschitzness lower bound $\Omega(n^{1/d})$ for robust interpolation. Our results demonstrate a two-fold law of robustness: i) we show the potential benefit of overparametrization for smooth data interpolation when $n=\mathrm{poly}(d)$, and ii) we disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=\exp(\omega(d))$.

artificial intelligence, machine learning, robustness, (15 more...)

arXiv.org Artificial Intelligence

2202.11592

Country:

North America > United States > Maryland (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)

Add feedback