Theoretical Analysis on how Learning Rate Warmup Accelerates Convergence

Open in new window