Bayes optimal learning of attention-indexed models

Boncoraglio, Fabrizio, Troiani, Emanuele, Erba, Vittorio, Zdeborová, Lenka

Jun-3-2025–arXiv.org Machine Learning

We introduce the attention-indexed model (AIM), a theoretical framework for analyzing learning in deep attention layers. Inspired by multi-index models, AIM captures how token-level outputs emerge from layered bilinear interactions over high-dimensional embeddings. Unlike prior tractable attention models, AIM allows full-width key and query matrices, aligning more closely with practical transformers. Using tools from statistical mechanics and random matrix theory, we derive closed-form predictions for Bayes-optimal generalization error and identify sharp phase transitions as a function of sample complexity, model width, and sequence length. We propose a matching approximate message passing algorithm and show that gradient descent can reach optimal performance. AIM offers a solvable playground for understanding learning in modern attention architectures.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

Jun-3-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > San Diego County > San Diego (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Switzerland > Vaud
    - Lausanne (0.04)
- Africa > Middle East
  - Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre:
- Research Report (0.81)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.45)
    - Statistical Learning > Gradient Descent (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found