AITopics | overparameterized network

Collaborating Authors

overparameterized network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Effect of Label Noise on the Information Content of Neural Representations

Umar, Ali Hussaini, Tezoh, Franky Kevin Nando, Barbier, Jean, Acevedo, Santiago, Laio, Alessandro

arXiv.org Machine LearningOct-9-2025

In supervised classification tasks, models are trained to predict a label for each data point. In real-world datasets, these labels are often noisy due to annotation errors. While the impact of label noise on the performance of deep learning models has been widely studied, its effects on the networks' hidden representations remain poorly understood. We address this gap by systematically comparing hidden representations using the Information Imbalance, a computationally efficient proxy of conditional mutual information. Through this analysis, we observe that the information content of the hidden representations follows a double descent as a function of the number of network parameters, akin to the behavior of the test error. We further demonstrate that in the underparameterized regime, representations learned with noisy labels are more informative than those learned with clean labels, while in the overparameterized regime, these representations are equally informative. Our results indicate that the representations of overparameterized networks are robust to label noise. We also found that the information imbalance between the penultimate and pre-softmax layers decreases with cross-entropy loss in the overparameterized regime. This offers a new perspective on understanding generalization in classification tasks. Extending our analysis to representations learned from random labels, we show that these perform worse than random features. This indicates that training on random labels drives networks much beyond lazy learning, as weights adapt to encode labels information.

information content, label noise, representation, (14 more...)

arXiv.org Machine Learning

2510.06401

Country:

Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Reviews: Convergence of Adversarial Training in Overparametrized Neural Networks

Neural Information Processing SystemsMay-31-2025, 13:08:48 GMT

EDIT: I have read the author feedback and the authors have agreed to revise the writing. This is clearly a good paper that should be accepted. Two more comments regarding the rebuttal: (1) My original comments apply to natural training as well, and I understand this is a very challenging topic. For example, one such parameter is width. As far as I know, this is also first such results for adversarial training.

adversarial training, dependence, overparametrized neural network, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

A General Framework of the Consistency for Large Neural Networks

Zhan, Haoran, Xia, Yingcun

arXiv.org Machine LearningOct-2-2024

Neural networks have shown remarkable success, especially in overparameterized or "large" models. Despite increasing empirical evidence and intuitive understanding, a formal mathematical justification for the behavior of such models, particularly regarding overfitting, remains incomplete. In this paper, we propose a general regularization framework to study the Mean Integrated Squared Error (MISE) of neural networks. This framework includes many commonly used neural networks and penalties, such as ReLu and Sigmoid activations and $L^1$, $L^2$ penalties. Based on our frameworks, we find the MISE curve has two possible shapes, namely the shape of double descents and monotone decreasing. The latter phenomenon is new in literature and the causes of these two phenomena are also studied in theory. These studies challenge conventional statistical modeling frameworks and broadens recent findings on the double descent phenomenon in neural networks.

generalization error, inequality, neural network, (15 more...)

arXiv.org Machine Learning

2409.14123

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)
Asia > China (0.04)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

How Does Overparameterization Affect Features?

Duzgun, Ahmet Cagri, Jelassi, Samy, Li, Yuanzhi

arXiv.org Artificial IntelligenceJul-1-2024

Overparameterization, the condition where models have more parameters than necessary to fit their training loss, is a crucial factor for the success of deep learning. However, the characteristics of the features learned by overparameterized networks are not well understood. In this work, we explore this question by comparing models with the same architecture but different widths. We first examine the expressivity of the features of these models, and show that the feature space of overparameterized networks cannot be spanned by concatenating many underparameterized features, and vice versa. This reveals that both overparameterized and underparameterized networks acquire some distinctive features. We then evaluate the performance of these models, and find that overparameterized networks outperform underparameterized networks, even when many of the latter are concatenated. We corroborate these findings using a VGG-16 and ResNet18 on CIFAR-10 and a Transformer on the MNLI classification dataset. Finally, we propose a toy setting to explain how overparameterized networks can learn some important features that the underparamaterized networks cannot learn. Overparameterized neural networks, which have more parameters than necessary to fit the training data, have achieved remarkable success in various tasks, such as image classification (He et al., 2016; Krizhevsky et al., 2017), object detection (Girshick et al., 2014; Redmon et al., 2016) or text classification (Zhang et al., 2015; Johnson & Zhang, 2016). However, the theoretical understanding of why these networks outperform underparameterized ones, which have fewer parameters and less capacity, is still limited.

experiment, fse, overparameterized network, (16 more...)

arXiv.org Artificial Intelligence

2407.00968

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Random Search as a Baseline for Sparse Neural Network Architecture Search

Farahani, Rezsa

arXiv.org Artificial IntelligenceMar-14-2024

Overparameterized neural networks are loosely characterized as networks that have a very high fitting (or memorization) capacity with respect to their training data. Although capable of memorization of their training data, these networks intriguingly achieve very low test error close to their training error rates [1, 2]. Meanwhile, sparse neural networks have shown similar or better generalization performance than their dense counterparts while having higher parameter efficiency [3]. With increasing availability of hardware and software that support sparse computational operations [4, 5], there has been a growing interest in finding sparse sub-networks within large overparameterized models to either improve generalization performance or to gain computational efficiency at the same performance level [6, 7, 8, 3]. Earlier works on creating efficient sparse sub-networks include the now popular pruning technique [9]. These were motivated by the desire to achieve compute efficiency in resource constraint applications by finding smaller networks within a larger network space without losing task performance quality [10]. The original pruning technique involves fully training a larger network on some task, discarding the task-irrelevant connections, and then fine-tuning the remaining sparse sub-network on the task to achieve the a level of performance near that of the larger network. Connections were originally pruned based on loss Hessians [9, 11]. Later on, other techniques were proposed such as the removal of weak connections [12] based on weight value thresholds.

neural network, random search, sparsity, (12 more...)

arXiv.org Artificial Intelligence

2403.08265

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback