Approximation Rate of the Transformer Architecture for Sequence Modeling

Neural Information Processing Systems 

In this work, we investigate the approximation rate results for the Transformer architectures on general sequence to sequence target relationships.