Insights into Pre-training via Simpler Synthetic Tasks Yuhuai Wu