Approximation Rate of the Transformer Architecture for Sequence Modeling