Reliable Learning of Bernoulli Mixture Models
Najafi, Amir, Motahari, Abolfazl, Rabiee, Hamid R.
In this paper, we have derived a set of sufficient conditions for reliable clustering of data produced by Bernoulli Mixture Models (BMM), when the number of clusters is unknown. A BMM refers to a random binary vector whose components are independent Bernoulli trials with cluster-specific frequencies. The problem of clustering BMM data arises in many real-world applications, most notably in population genetics where researchers aim at inferring the population structure from multilocus genotype data. Our findings stipulate a minimum dataset size and a minimum number of Bernoulli trials (or genotyped loci) per sample, such that the existence of a clustering algorithm with a sufficient accuracy is guaranteed. Moreover, the mathematical intuitions and tools behind our work can help researchers in designing more effective and theoretically-plausible heuristic methods for similar problems.
Oct-5-2017
- Country:
- Asia > Middle East > Iran (0.14)
- Genre:
- Research Report > New Finding (0.34)
- Industry:
- Technology: