Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
–Neural Information Processing Systems
Learning Rate Warmup is a popular heuristic for training neural networks, especially at larger batch sizes, despite limited understanding of its benefits.
Neural Information Processing Systems
May-28-2025, 08:03:04 GMT
- Country:
- North America > United States (0.14)
- Genre:
- Research Report > Experimental Study (1.00)
- Technology: