Infinite Limits of Multi-head Transformer Dynamics

Neural Information Processing Systems 

In this work, we analyze various scaling limits of the training dynamics of transformer models in the feature learning regime.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found