Gradient Sparsification for Communication-Efficient Distributed Optimization