When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective