Goto

Collaborating Authors

 discrete distribution estimation


Nearly Minimax Discrete Distribution Estimation in Kullback-Leibler Divergence with High Probability

arXiv.org Machine Learning

We consider the problem of estimating a discrete distribution $p$ with support of size $K$ and provide both upper and lower bounds with high probability in KL divergence. We prove that in the worst case, for any estimator $\widehat{p}$, with probability at least $δ$, $\text{KL}(p \| \widehat{p}) \geq C\max\{K,\ln(K)\ln(1/δ) \}/n $, where $n$ is the sample size and $C > 0$ is a constant. We introduce a computationally efficient estimator $p^{\text{OTB}}$, based on Online to Batch conversion and suffix averaging, and show that with probability at least $1 - δ$ $\text{KL}(p \| \widehat{p}) \leq C(K\log(\log(K)) + \ln(K)\ln(1/δ)) /n$. Furthermore, we also show that with sufficiently many observations relative to $\log(1/δ)$, the maximum likelihood estimator $\bar{p}$ guarantees that with probability at least $1-δ$ $$ 1/6 χ^2(\bar{p}\|p) \leq 1/4 χ^2(p\|\bar{p}) \leq \text{KL}(p|\bar{p}) \leq C(K + \log(1/δ))/n\,, $$ where $χ^2$ denotes the $χ^2$-divergence.


Concentration Bounds for Discrete Distribution Estimation in KL Divergence

arXiv.org Artificial Intelligence

Discrete distribution estimation, i.e., density estimation over discrete domains, is a fundamental problem in Statistics, with a rich history (see, e.g., [9, 10] for an overview and further references). In this work, we address a simple yet surprisingly ill-understood aspect of this question: what is sample complexity of estimating an arbitrary discrete distribution in Kullback-Leibler (KL) divergence with vanishing probability of error? To describe the problem further, a few definitions are in order.


Discrete Distribution Estimation under Local Privacy

arXiv.org Machine Learning

The collection and analysis of user data drives improvements in the app and web ecosystems, but comes with risks to privacy. This paper examines discrete distribution estimation under local privacy, a setting wherein service providers can learn the distribution of a categorical statistic of interest without collecting the underlying data. We present new mechanisms, including hashed K-ary Randomized Response (KRR), that empirically meet or exceed the utility of existing mechanisms at all privacy levels. New theoretical results demonstrate the order-optimality of KRR and the existing RAPPOR mechanism at different privacy regimes.