Identification of Mixtures of Discrete Product Distributions in Near-Optimal Sample and Time Complexity
Gordon, Spencer L., Jahn, Erik, Mazaheri, Bijan, Rabani, Yuval, Schulman, Leonard J.
We consider the problem of identifying, from statistics, a distribution of discrete random variables $X_1,\ldots,X_n$ that is a mixture of $k$ product distributions. The best previous sample complexity for $n \in O(k)$ was $(1/\zeta)^{O(k^2 \log k)}$ (under a mild separation assumption parameterized by $\zeta$). The best known lower bound was $\exp(\Omega(k))$. It is known that $n\geq 2k-1$ is necessary and sufficient for identification. We show, for any $n\geq 2k-1$, how to achieve sample complexity and run-time complexity $(1/\zeta)^{O(k)}$. We also extend the known lower bound of $e^{\Omega(k)}$ to match our upper bound across a broad range of $\zeta$. Our results are obtained by combining (a) a classic method for robust tensor decomposition, (b) a novel way of bounding the condition number of key matrices called Hadamard extensions, by studying their action only on flattened rank-1 tensors.
Sep-25-2023
- Country:
- Asia > Middle East
- Israel (0.14)
- North America > United States
- California (0.14)
- Asia > Middle East
- Genre:
- Research Report (0.70)
- Technology: