Understanding the Expressive Power and Mechanisms of Transformer for Sequence Modeling

Open in new window