Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
–Neural Information Processing Systems
Learning Rate Warmup is a popular heuristic for training neural networks, especially at larger batch sizes, despite limited understanding of its benefits.
Neural Information Processing Systems
Nov-13-2025, 08:17:30 GMT
- Country:
- Asia
- China > Guangxi Province
- Nanning (0.04)
- Middle East > Jordan (0.04)
- China > Guangxi Province
- Europe > Switzerland (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States > Texas
- Coleman County (0.04)
- Canada > Ontario
- Asia
- Genre:
- Research Report > Experimental Study (1.00)
- Technology: