The Mean Partition Theorem of Consensus Clustering
Clustering is a standard technique for exploratory data analysis that finds applications across different disciplines such as computer science, biology, marketing, and social science. The goal of clustering is to group a set of unlabeled data points into several clusters based on some notion of dissimilarity. Inspired by the success of classifier ensembles, consensus clustering has emerged as a research topic [8, 23]. Consensus clustering first generates several partitions of the same dataset. Then it combines the sample partitions to a single consensus partition. The assumption is that a consensus partition better fits to the hidden structure in the data than individual partitions. One standard approach of consensus clustering combines the sample partitions to a mean partition [3, 4, 5, 6, 9, 17, 20, 21, 22].
Apr-26-2016