Infinite Limits of Multi-head Transformer Dynamics
–Neural Information Processing Systems
In this work, we analyze various scaling limits of the training dynamics of transformer models in the feature learning regime.
Neural Information Processing Systems
Nov-16-2025, 01:22:27 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Texas > Clay County (0.04)
- Massachusetts > Middlesex County
- Asia > Middle East
- Genre:
- Research Report > Experimental Study (0.92)
- Technology: