Theoretical Analysis on how Learning Rate Warmup Accelerates Convergence