A Theoretical Framework for OOD Robustness in Transformers using Gevrey Classes

Jun-2-2025–arXiv.org Artificial Intelligence

We study the robustness of Transformer language models under semantic out-of-distribution (OOD) shifts, where training and test data lie in disjoint latent spaces. Using Wasserstein-1 distance and Gevrey-class smoothness, we derive sub-exponential upper bounds on prediction error. Our theoretical framework explains how smoothness governs generalization under distributional drift. We validate these findings through controlled experiments on arithmetic and Chain-of-Thought tasks with latent permutations and scalings. Results show empirical degradation aligns with our bounds, highlighting the geometric and functional principles underlying OOD generalization in Transformers.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jun-2-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.68)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found