The Effect of Network Width on the Performance of Large-batch Training

Lingjiao Chen, Hongyi Wang, Jinman Zhao, Dimitris Papailiopoulos, Paraschos Koutris

Nov-20-2025, 20:57:20 GMT–Neural Information Processing Systems

Distributed implementations of mini-batch stochastic gradient descent (SGD) suffer from communication overheads, attributed to the high frequency of gradient updates inherent in small-batch training.

artificial intelligence, batch size, machine learning, (16 more...)

Neural Information Processing Systems

Nov-20-2025, 20:57:20 GMT

Conferences PDF

Country:
- North America
  - United States > Wisconsin
    - Dane County > Madison (0.04)
  - Canada > Quebec
    - Montreal (0.04)

Genre:
- Research Report (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (0.89)
  - Neural Networks (0.76)

Duplicate Docs Excel Report

Title
The Effect of Network Width on the Performance of Large-batch Training
The Effect of Network Width on the Performance of Large-batch Training

Similar Docs Excel Report more

Title	Similarity	Source
None found