Global Momentum Compression for Sparse Communication in Distributed SGD