AITopics | overfit

How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?

Neural Information Processing SystemsJun-14-2026, 08:17:17 GMT

The capacity of deep learning models is often large enough to both learn the underlying statistical signal and overfit to noise in the training set. This noise memorization can be harmful especially for data with a low signal-to-noise ratio (SNR), leading to poor generalization. Inspired by prior observations that label noise provides implicit regularization that improves generalization, in this work, we investigate whether introducing label noise to the gradient updates can enhance the test performance of neural network (NN) in the low SNR regime. Specifically, we consider training a two-layer NN with a simple label noise gradient descent (GD) algorithm, in an idealized signal-noise data setting. We prove that adding label noise during training suppresses noise memorization, preventing it from dominating the learning process; consequently, label noise GD enjoys rapid signal growth while the overfitting remains controlled, thereby achieving good generalization despite the low SNR. In contrast, we also show that NN trained with standard GD tends to overfit to noise in the same low SNR setting and establish a non-vanishing lower bound on its test error, thus demonstrating the benefit of introducing label noise in gradient-based training.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Training the Untrainable: Introducing Inductive Bias via Representational Alignment

Neural Information Processing SystemsJun-13-2026, 05:30:17 GMT

We demonstrate that architectures which traditionally are considered to be ill-suited for a task can be trained using inductive biases from another architecture. We call a network untrainable when it overfits, underfits, or converges to poor results even when tuning their hyperparameters. For example, fully connected networks overfit on object recognition while deep convolutional networks without residual connections underfit. The traditional answer is to change the architecture to impose some inductive bias, although the nature of that bias is unknown. We introduce guidance, where a guide network steers a target network using a neural distance function.

architecture, artificial intelligence, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.59)

Add feedback

Towards precision protein-ligand affinity prediction benchmark: A Complete and Modification-Aware DAVIS Dataset

Neural Information Processing SystemsJun-12-2026, 12:35:09 GMT

Advancements in AI for science unlocks capabilities for critical drug discovery tasks such as protein-ligand binding affinity prediction. However, current models overfit to existing oversimplified datasets that does not represent naturally occurring and biologically relevant proteins with modifications. In this work, we curate a complete and modification-aware version of the widely used DAVIS dataset by incorporating 4,032 kinase-ligand pairs involving substitutions, insertions, deletions, and phosphorylation events. This enriched dataset enables benchmarking of predictive models under biologically realistic conditions. Based on this new dataset, we propose three benchmark settings--Augmented Dataset Prediction, Wild-Type to Modification Generalization, and Few-Shot Modification Generalization--designed to assess model robustness in the presence of protein modifications. Through extensive evaluation of both docking-free and docking-based methods, we find that docking-based model generalize better in zero-shot settings. In contrast, docking-free models tend to overfit to wild-type proteins and struggle with unseen modifications but show notable improvement when fine-tuned on a small set of modified examples. We anticipate that the curated dataset and benchmarks offer a valuable foundation for developing models that better generalize to protein modifications, ultimately advancing precision medicine in drug discovery.

artificial intelligence, modeling & simulation, proceedings, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence (0.76)
Information Technology > Modeling & Simulation (0.59)

Add feedback

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

Neural Information Processing SystemsFeb-13-2026, 09:40:39 GMT

Further analysis suggests a positive relationship (Spearman's r

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Indonesia > Bali (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

abe8e03e3ac71c2ec3bfb0de042638d8-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 14:38:16 GMT

c-bet, egocentric view, panoramic view, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.99)

Add feedback

SupplementaryMaterialsforViSER: Video-Specific SurfaceEmbeddingsforArticulated3DShape Reconstruction

Neural Information Processing SystemsFeb-10-2026, 08:32:23 GMT

In contrast, ViSER is able to correctlyreconstructthedancer.

artificial intelligence, incvpr, supplementarymaterialsforviser, (15 more...)

Neural Information Processing Systems

Country: Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.05)

Technology: Information Technology > Artificial Intelligence > Vision (0.50)

Add feedback

67b0579a7298d9cf39c59404d867bdd7-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 13:16:17 GMT

gradient, neural network, regularisation, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Sweden > Vaestra Goetaland > Gothenburg (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

1943102704f8f8f3302c2b730728e023-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 15:45:06 GMT

iso, pose estimation, ssl technique, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.05)
Asia > Singapore (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.55)
Information Technology > Artificial Intelligence > Vision > Video Understanding (0.49)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.43)

Add feedback

Autoencoders that don't overfit towards the Identity

Neural Information Processing SystemsDec-24-2025, 19:16:43 GMT

Autoencoders (AE) aim to reproduce the output from the input. They may hence tend to overfit towards learning the identity-function between the input and output, i.e., they may predict each feature in the output from itself in the input. This is not useful, however, when AEs are used for prediction tasks in the presence of noise in the data. It may seem intuitively evident that this kind of overfitting is prevented by training a denoising AE, as the dropped-out features have to be predicted from the other features. In this paper, we consider linear autoencoders, as they facilitate analytic solutions, and first show that denoising / dropout actually prevents the overfitting towards the identity-function only to the degree that it is penalized by the induced L2-norm regularization. In the main theorem of this paper, we show that the emphasized denoising AE is indeed capable of completely eliminating the overfitting towards the identity-function. Our derivations reveal several new insights, including the closed-form solution of the full-rank model, as well as a new (near-)orthogonality constraint in the low-rank model. While this constraint is conceptually very different from the regularizers recently proposed, their resulting effects on the learned embeddings are empirically similar. Our experiments on three well-known data-sets corroborate the various theoretical insights derived in this paper.

autoencoder, name change, overfit, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Robust Hypothesis Test for Nonlinear Effect with Gaussian Processes

Jeremiah Liu, Brent Coull

Neural Information Processing SystemsNov-21-2025, 06:17:28 GMT

We pay special attention to the setting where the sample size n is small. This type of tests carries concrete significance in scientific studies.

artificial intelligence, machine learning, modeling & simulation, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.48)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)

Add feedback

Filters

Collaborating Authors

overfit

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?

Training the Untrainable: Introducing Inductive Bias via Representational Alignment

Towards precision protein-ligand affinity prediction benchmark: A Complete and Modification-Aware DAVIS Dataset

A Careful Examination of Large Language Model Performance on Grade School Arithmetic

abe8e03e3ac71c2ec3bfb0de042638d8-Supplemental.pdf

SupplementaryMaterialsforViSER: Video-Specific SurfaceEmbeddingsforArticulated3DShape Reconstruction

67b0579a7298d9cf39c59404d867bdd7-Paper-Conference.pdf

1943102704f8f8f3302c2b730728e023-Supplemental.pdf

Autoencoders that don't overfit towards the Identity

Robust Hypothesis Test for Nonlinear Effect with Gaussian Processes