bisecting k-means
Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search
Hierarchical clustering is a data analysis method that has been used for decades. Despite its widespread use, the method has an underdeveloped analytical foundation. Having a well understood foundation would both support the currently used methods and help guide future improvements. The goal of this paper is to give an analytic framework to better understand observations seen in practice. This paper considers the dual of a problem framework for hierarchical clustering introduced by Dasgupta.
Reviews: Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search
The paper extends the work of Dasgupta towards defining a theoretical framework for evaluating hierarchical clustering. The paper defines an objective function that is equivalent to that of Dasgupta in the sense that an optimal solution for the cost function defined by Dasgupta will also be optimal for the one defined in this paper and vice versa. The behaviour of approximate solution will differ though because the cost function is of the form D(G) - C(T), where D is a constant (depending on the input G) and C(T) is the cost (defined by Dasgupta) and T is the hierarchical solution. The authors show a constant approximation guarantee w.r.t. They complement this approximation guarantee by showing that this approximation guarantee is almost tight.
Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search
Moseley, Benjamin, Wang, Joshua
Hierarchical clustering is a data analysis method that has been used for decades. Despite its widespread use, the method has an underdeveloped analytical foundation. Having a well understood foundation would both support the currently used methods and help guide future improvements. The goal of this paper is to give an analytic framework to better understand observations seen in practice. This paper considers the dual of a problem framework for hierarchical clustering introduced by Dasgupta. The main result is that one of the most popular algorithms used in practice, average linkage agglomerative clustering, has a small constant approximation ratio for this objective.