Improved Distributed Principal Component Analysis
Liang, Yingyu, Balcan, Maria-Florina F., Kanchanapally, Vandana, Woodruff, David
–Neural Information Processing Systems
We study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA), in which the servers would like to compute a low dimensional subspace capturing as much of the variance of the union of their point sets as possible. Given a procedure for approximate PCA, one can use it to approximately solve problems such as $k$-means clustering and low rank approximation. The essential properties of an approximate distributed PCA algorithm are its communication cost and computational efficiency for a given desired accuracy in downstream applications. We give new algorithms and analyses for distributed PCA which lead to improved communication and computational costs for $k$-means clustering and related problems.
Neural Information Processing Systems
Feb-14-2020, 11:44:14 GMT
- Technology: