communication cost
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
Most distributed machine learning systems nowadays, including TensorFlow and CNTK, are built in a centralized fashion. One bottleneck of centralized algorithms lies on high communication cost on the central node. Motivated by this, we ask, can decentralized algorithms be faster than its centralized counterpart? Although decentralized PSGD (D-PSGD) algorithms have been studied by the control community, existing analysis and theory do not show any advantage over centralized PSGD (C-PSGD) algorithms, simply assuming the application scenario where only the decentralized network is available. In this paper, we study a D-PSGD algorithm and provide the first theoretical analysis that indicates a regime in which decentralized algorithms might outperform centralized algorithms for distributed stochastic gradient descent. This is because D-PSGD has comparable total computational complexities to C-PSGD but requires much less communication cost on the busiest node.
A Communication-Efficient Parallel Algorithm for Decision Tree
Decision tree (and its extensions such as Gradient Boosting Decision Trees and Random Forest) is a widely used machine learning algorithm, due to its practical effectiveness and model interpretability. With the emergence of big data, there is an increasing need to parallelize the training process of decision tree. However, most existing attempts along this line suffer from high communication costs. In this paper, we propose a new algorithm, called \emph{Parallel Voting Decision Tree (PV-Tree)}, to tackle this challenge. After partitioning the training data onto a number of (e.g., $M$) machines, this algorithm performs both local voting and global voting in each iteration.
Distributed k -Clustering for Data with Heavy Noise
In this paper, we consider the $k$-center/median/means clustering with outliers problems (or the $(k, z)$-center/median/means problems) in the distributed setting. Most previous distributed algorithms have their communication costs linearly depending on $z$, the number of outliers. Recently Guha et al.[10] overcame this dependence issue by considering bi-criteria approximation algorithms that output solutions with $2z$ outliers. For the case where $z$ is large, the extra $z$ outliers discarded by the algorithms might be too large, considering that the data gathering process might be costly. In this paper, we improve the number of outliers to the best possible $(1+\epsilon)z$, while maintaining the $O(1)$-approximation ratio and independence of communication cost on $z$. The problems we consider include the $(k, z)$-center problem, and $(k, z)$-median/means problems in Euclidean metrics. Implementation of the our algorithm for $(k, z)$-center shows that it outperforms many previous algorithms, both in terms of the communication cost and quality of the output solution.
A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication
The large communication overhead has imposed a bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) for training deep neural networks. Previous works have demonstrated the potential of using gradient sparsification and quantization to reduce the communication cost. However, there is still a lack of understanding about how sparse and quantized communication affects the convergence rate of the training algorithm. In this paper, we study the convergence rate of distributed SGD for non-convex optimization with two communication reducing strategies: sparse parameter averaging and gradient quantization. We show that $O(1/\sqrt{MK})$ convergence rate can be achieved if the sparsification and quantization hyperparameters are configured properly. We also propose a strategy called periodic quantized averaging (PQASGD) that further reduces the communication cost while preserving the $O(1/\sqrt{MK})$ convergence rate. Our evaluation validates our theoretical results and shows that our PQASGD can converge as fast as full-communication SGD with only $3\%-5\%$ communication data size.
- North America > Canada (0.04)
- Europe > Germany (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > United Kingdom (0.04)
- Education (0.67)
- Information Technology (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Communications > Networks (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- North America > United States > California (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > China (0.04)
- (2 more...)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Communications (1.00)
- (2 more...)
- North America > Canada > Quebec > Montreal (0.28)
- North America > United States > Virginia (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)