Model Selection in Clustering by Uniform Convergence Bounds

Buhmann, Joachim M., Held, Marcus

Neural Information Processing Systems 

Unsupervised learning algorithms are designed to extract structure fromdata samples. Reliable and robust inference requires a guarantee that extracted structures are typical for the data source, Le., similar structures have to be inferred from a second sample set of the same data source. The overfitting phenomenon in maximum entropybased annealing algorithms is exemplarily studied for a class of histogram clustering models. Bernstein's inequality for large deviations is used to determine the maximally achievable approximation quality parameterized by a minimal temperature. Monte Carlo simulations support the proposed model selection criterion byfinite temperature annealing.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found