The Effect of Network Width on the Performance of Large-batch Training
Lingjiao Chen, Hongyi Wang, Jinman Zhao, Dimitris Papailiopoulos, Paraschos Koutris
–Neural Information Processing Systems
Distributed implementations of mini-batch stochastic gradient descent (SGD) suffer from communication overheads, attributed to the high frequency of gradient updates inherent in small-batch training.
Neural Information Processing Systems
Nov-20-2025, 20:57:20 GMT
- Country:
- North America
- Canada > Quebec
- Montreal (0.04)
- United States > Wisconsin
- Dane County > Madison (0.04)
- Canada > Quebec
- North America
- Genre:
- Research Report (0.46)
- Technology: