The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models