Goto

Collaborating Authors

 James Kwok



Scalable Robust Matrix Factorization with Nonconvex Loss

Neural Information Processing Systems

Moreover, even the state-of-the-art RMF solver (RMF-MM) is slow and cannot utilize data sparsity. In this paper, we propose to improve robustness by using nonconvex loss functions. The resultant optimization problem is difficult. To improve efficiency and scalability, we use majorization-minimization (MM) and optimize the MM surrogate by using the accelerated proximal gradient algorithm on its dual problem. Data sparsity can also be exploited. The resultant algorithm has low time and space complexities, and is guaranteed to converge to a critical point. Extensive experiments show that it outperforms the state-of-the-art in terms of both accuracy and speed.



Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback

Neural Information Processing Systems

Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in using gradient compression to improve the communication efficiency of distributed neural network training. Using 1-bit quantization, signSGD with majority vote achieves a 32x reduction on communication cost. However, its convergence is based on unrealistic assumptions and can diverge in practice. In this paper, we propose a general distributed compressed SGD with Nesterov's momentum.


Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback

Neural Information Processing Systems

Communication overhead is a major bottleneck hampering the scalability of distributed machine learning systems. Recently, there has been a surge of interest in using gradient compression to improve the communication efficiency of distributed neural network training. Using 1-bit quantization, signSGD with majority vote achieves a 32x reduction on communication cost. However, its convergence is based on unrealistic assumptions and can diverge in practice. In this paper, we propose a general distributed compressed SGD with Nesterov's momentum.