Learned Gradient Compression for Distributed Deep Learning