Learning Populations of Parameters
Tian, Kevin, Kong, Weihao, Valiant, Gregory
–Neural Information Processing Systems
Consider the following estimation problem: there are $n$ entities, each with an unknown parameter $p_i \in [0,1]$, and we observe $n$ independent random variables, $X_1,\ldots,X_n$, with $X_i \sim $ Binomial$(t, p_i)$. How accurately can one recover the ``histogram'' (i.e. cumulative density function) of the $p_i$'s? While the empirical estimates would recover the histogram to earth mover distance $\Theta(\frac{1}{\sqrt{t}})$ (equivalently, $\ell_1$ distance between the CDFs), we show that, provided $n$ is sufficiently large, we can achieve error $O(\frac{1}{t})$ which is information theoretically optimal. We also extend our results to the multi-dimensional parameter case, capturing settings where each member of the population has multiple associated parameters. Beyond the theoretical results, we demonstrate that the recovery algorithm performs well in practice on a variety of datasets, providing illuminating insights into several domains, including politics, sports analytics, and variation in the gender ratio of offspring.
Neural Information Processing Systems
Dec-31-2017
- Country:
- North America > United States > California > Santa Clara County (0.14)
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Health & Medicine (1.00)
- Leisure & Entertainment > Sports
- Basketball (1.00)
- Technology: