Robust Mean Estimation on Highly Incomplete Data with Arbitrary Outliers
We study the problem of robustly estimating the mean of a $d$-dimensional distribution given $N$ examples, where $\varepsilon N$ examples may be arbitrarily corrupted and most coordinates of every example may be missing. Assuming each coordinate appears in a constant factor more than $\varepsilon N$ examples, we show algorithms that estimate the mean of the distribution with information-theoretically optimal dimension-independent error guarantees in nearly-linear time $\widetilde O(Nd)$. Our results extend recent work on computationally-efficient robust estimation to a more widely applicable incomplete-data setting.
Aug-19-2020
- Country:
- North America > United States
- Massachusetts (0.04)
- California > Santa Clara County
- Palo Alto (0.04)
- Asia > Afghanistan
- Parwan Province > Charikar (0.04)
- North America > United States
- Genre:
- Research Report (0.70)
- Industry:
- Education > Educational Setting > Online (0.68)
- Technology: