Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization

Neural Information Processing Systems 

Large scale distributed optimization has become the default tool for the training of supervised machine learning models with a large number of parameters and training data.