Reviews: ATOMO: Communication-efficient Learning via Atomic Sparsification
–Neural Information Processing Systems
After rebutal; I do not wish to change my evaluation. Regarding convergence, I think that this should be clarified in the paper, to at least ensure that this is not producting divergent sequences under resaonable assumptions. As for the variance, the author control the variance of a certain variable \hat{g} given g but they should control the variance of \hat{g} without conditioning to invoke general convergence results. This is very minor but should be mentioned. The authors consider the problem of empirical risk minimization using a distributed stochastic gradient descent algorithm.
Neural Information Processing Systems
Oct-7-2024, 08:01:38 GMT
- Technology: