Infinite Limits of Multi-head Transformer Dynamics
–Neural Information Processing Systems
In this work, we analyze various scaling limits of the training dynamics of transformer models in the feature learning regime.
Neural Information Processing Systems
May-29-2025, 06:16:53 GMT
- Country:
- North America > United States (0.14)
- Genre:
- Research Report > Experimental Study (0.92)
- Technology: