The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training