PG-means: learning the number of clusters in data
–Neural Information Processing Systems
We present a novel algorithm called PG-means which is able to learn the number of clusters in a classical Gaussian mixture model. Our method is robust and efficient; it uses statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we are applying a statistical test for the entire model at once, not just on a per-cluster basis. We show that our method works well in difficult cases such as non-Gaussian data, overlapping clusters, eccentric clusters, high dimension, and many true clusters. Further, our new method provides a much more stable estimate of the number of clusters than existing methods.
Neural Information Processing Systems
Dec-31-2007
- Country:
- North America > United States
- Texas > McLennan County
- Waco (0.04)
- California > Santa Clara County
- Palo Alto (0.04)
- Texas > McLennan County
- North America > United States
- Technology: