Transformers learn to implement preconditioned gradient descent for in-context learning

Feb-15-2026, 20:28:17 GMT–Neural Information Processing Systems

Several recent works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate iterations of gradient descent.

artificial intelligence, machine learning, transformer, (19 more...)

Neural Information Processing Systems

Feb-15-2026, 20:28:17 GMT

Conferences PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Germany
  - Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (0.91)
  - Neural Networks > Deep Learning (0.67)

Duplicate Docs Excel Report

Title
8ed3d610ea4b68e7afb30ea7d01422c6-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found