Transformers Efficiently Perform In-Context Logistic Regression via Normalized Gradient Descent

May-8-2026–arXiv.org Machine Learning

One widely recognized interpretation for their empirical success is their ability to perform in-context learning (ICL): pretrained transformers are capable of performing previously unseen tasks based on demonstrations and examples in the prompt, without requiring any additional task-specific fine-tuning (Brown et al., 2020). A line of recent works interpret the in-context learning (ICL) capability of transformers from an algorithmic perspective, viewing transformers as models that can implicitly execute certain learning algorithms on the context examples. Specifically, Garg et al. (2022) proposes a theoretical framework for ICL in terms of learning a hypothesis class, and empirically shows that transformers can in-context learn the linear function class. Motivated by this empirical finding, several recent works attempt to theoretically study how transformers perform in-context learning on linear regression tasks. Aky urek et al. (2022); Von Oswald et al. (2023) construct multi-layer transformers with linear attention that can execute gradient descent on the an "in-context loss" defined on the context data, thereby enabling in-context learning of linear regression.

ak 1, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

May-8-2026

arXiv.org PDF

Add feedback

Genre:
- Research Report
  - New Finding (0.64)
  - Experimental Study (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning
  - Regression (1.00)
  - Gradient Descent (0.70)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found