Transformers learn to implement preconditioned gradient descent for in-context learning
–Neural Information Processing Systems
Several recent works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate iterations of gradient descent.
Neural Information Processing Systems
Feb-15-2026, 20:28:17 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > Germany
- Bavaria > Upper Bavaria > Munich (0.04)
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.46)
- Technology: