Approximation theory of transformer networks for sequence modeling