Simple and Optimal Sublinear Algorithms for Mean Estimation

Neural Information Processing Systems 

We study the sublinear multivariate mean estimation problem in d-dimensional Euclidean space. Specifically, we aim to find the mean µ of a ground point set A, which minimizes the sum of squared Euclidean distances of the points in Ato µ. We first show that a multiplicative (1 + ε) approximation to µ can be found with probability 1 δ using O(ε 1 logδ 1)many independent uniform random samples, and provide a matching lower bound. Furthermore, we give two estimators with optimal sample complexity that can be computed in optimal running time for extracting a suitable approximate mean: 1.