On the Computational Power of Transformers and its Implications in Sequence Modeling