Reviews: Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models

Feb-5-2025, 22:54:10 GMT–Neural Information Processing Systems

After rebuttal: I have carefully read the authors' response. Unfortunately, I do not think my concerns are well addressed. See Table 2 in "Regularizing and Optimizing LSTM Language Models" for comparison; (4) the performance of SGD on a single GTX1080 GPU does not tell how it performs with multiple workers (larger mini-batch size); (5) selecting learning rate based on the test error is not a good practice. For machine learning, we should select the hyper-parameters according to the accuracy on a hold-out validation set. Considering the above five points, I decide to keep my score unchanged.

deep learning model, experiment, leader stochastic gradient descent, (10 more...)

Neural Information Processing Systems

Feb-5-2025, 22:54:10 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)