Separating populations with wide data: A spectral analysis

Blum, Avrim, Coja-Oghlan, Amin, Frieze, Alan, Zhou, Shuheng

Jan-29-2009–arXiv.org Machine Learning

In this paper, we consider the problem of partitioning a small data sample drawn from a mixture of $k$ product distributions. We are interested in the case that individual features are of low average quality $\gamma$, and we want to use as few of them as possible to correctly partition the sample. We analyze a spectral technique that is able to approximately optimize the total data size--the product of number of data points $n$ and the number of features $K$--needed to correctly perform this partitioning as a function of $1/\gamma$ for $K>n$. Our goal is motivated by an application in clustering individuals according to their population of origin using markers, when the divergence between any two of the populations is small.

artificial intelligence, machine learning, separating population, (17 more...)

arXiv.org Machine Learning

Jan-29-2009

arXiv.org PDF

Add feedback

Country:
- Europe (0.45)
- North America > United States (0.28)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Statistical Learning (0.46)
    - Learning Graphical Models (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found