Approximation theory of transformer networks for sequence modeling

Open in new window