Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment