Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling

Neural Information Processing Systems 

Our study reveals the roles of critical parameters in the Transformer, such as the number of layers and the number of attention heads.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found