The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

Neural Information Processing Systems 

Finally, we show, through simulations, that the SDE provides a surprisingly good description of the corresponding finite-size model. We coin the name shaped Transformer for these architectural modifications.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found