Data Amplification: A Unified and Competitive Approach to Property Estimation
HAO, Yi, Orlitsky, Alon, Suresh, Ananda Theertha, Wu, Yihong
–Neural Information Processing Systems
Estimating properties of discrete distributions is a fundamental problem in statistical learning. We design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying distributions uses just 2n samples to achieve the performance attained by the empirical estimator with n\sqrt{\log n} samples. This provides off-the-shelf, distribution-independent, ``amplification'' of the amount of data available relative to common-practice estimators. We illustrate the estimator's practical advantages by comparing it to existing estimators for a wide variety of properties and distributions. In most cases, its performance with n samples is even as good as that of the empirical estimator with n\log n samples, and for essentially all properties, its performance is comparable to that of the best existing estimator designed specifically for that property.
Neural Information Processing Systems
Dec-31-2018
- Country:
- Europe
- Hungary > Budapest
- Budapest (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Hungary > Budapest
- North America
- Canada > Quebec
- Montreal (0.04)
- United States
- California > San Diego County
- Connecticut > New Haven County
- New Haven (0.04)
- Illinois > Cook County
- Chicago (0.04)
- New York > New York County
- New York City (0.14)
- Canada > Quebec
- Europe
- Technology: