When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective

Open in new window