Rethinking gradient sparsification as total error minimization

Open in new window