Subspace Recovery from Heterogeneous Data with Non-isotropic Noise

Neural Information Processing Systems 

Recovering linear subspaces from data is a fundamental and important task in statistics and machine learning. Motivated by heterogeneity in Federated Learning settings, we study a basic formulation of this problem: the principal component analysis (PCA), with a focus on dealing with irregular noise. Our data come from n users with user i contributing data samples from a d -dimensional distribution with mean \mu_i . Our goal is to recover the linear subspace shared by \mu_1,\ldots,\mu_n using the data points from all users, where every data point from user i is formed by adding an independent mean-zero noise vector to \mu_i . If we only have one data point from every user, subspace recovery is information-theoretically impossible when the covariance matrices of the noise vectors can be non-spherical, necessitating additional restrictive assumptions in previous work.