Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training