Noise Stability of Transformer Models

Haris, Themistoklis, Zhang, Zihan, Yoshida, Yuichi

Feb-10-2026–arXiv.org Machine Learning

Understanding simplicity biases in deep learning offers a promising path toward developing reliable AI. A common metric for this, inspired by Boolean function analysis, is average sensitivity, which captures a model's robustness to single-token perturbations. We argue that average sensitivity has two key limitations: it lacks a natural generalization to real-valued domains and fails to explain the "junta-like" input dependence we empirically observe in modern LLMs. To address these limitations, we propose noise stability as a more comprehensive simplicity metric. Noise stability expresses a model's robustness to correlated noise applied to all input coordinates simultaneously. We provide a theoretical analysis of noise stability for single-layer attention and ReLU MLP layers and tackle the multi-layer propagation problem with a covariance interval propagation approach. Building on this theory, we develop a practical noise stability regularization method. Experiments on algorithmic and next-token-prediction tasks show that our regularizer consistently catalyzes grokking and accelerates training by approximately 35% and 75% respectively. Simplicity Biases have been a promising direction of study in recent years (Shah et al., 2020; V a-sudeva et al., 2024; Bhattamishra et al., 2022) as they provide a unifying framework for generalization, interpretability and robustness. Neural networks, including Large Language Models (LLMs), often converge to the simplest possible functions that explain the training data.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

Feb-10-2026

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)
- Asia > Japan
  - Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found