On the Computational Power of Transformers and its Implications in Sequence Modeling

Open in new window