Goto

Collaborating Authors

 average linkage




Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search

Neural Information Processing Systems

Hierarchical clustering is a data analysis method that has been used for decades. Despite its widespread use, the method has an underdeveloped analytical foundation. Having a well understood foundation would both support the currently used methods and help guide future improvements. The goal of this paper is to give an analytic framework to better understand observations seen in practice. This paper considers the dual of a problem framework for hierarchical clustering introduced by Dasgupta.


Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search

Benjamin Moseley, Joshua Wang

Neural Information Processing Systems

Hierarchical clustering is a data analysis method that has been used for decades. Despite its widespread use, the method has an underdeveloped analytical foundation. Having a well understood foundation would both support the currently used methods and help guide future improvements. The goal of this paper is to give an analytic framework to better understand observations seen in practice.




Reviews: Subquadratic High-Dimensional Hierarchical Clustering

Neural Information Processing Systems

This paper proposes a new approach to approximating hierarchical agglomerative clustering (HAC) by requiring that at each round, only a gamma-best merge be performed (gamma being the multiplicative approximation factor to the closest pair). Two algorithms are introduced to approximate HAC - one for Ward and one for Average linkage. In both cases, the algorithms rely on using approximate nearest neighbor ANN as a black box. In addition, a bucketing datastructure is used in Wards algorithm and a subsampling procedure in used for Average linkage to guarantee the subquadratic runtime. This is a new contribution to the theoretical literature on HAC, a provable subquadratic algorithm for (an approximation to) HAC cases other than single linkage.


Reviews: Approximation Bounds for Hierarchical Clustering: Average Linkage, Bisecting K-means, and Local Search

Neural Information Processing Systems

The paper extends the work of Dasgupta towards defining a theoretical framework for evaluating hierarchical clustering. The paper defines an objective function that is equivalent to that of Dasgupta in the sense that an optimal solution for the cost function defined by Dasgupta will also be optimal for the one defined in this paper and vice versa. The behaviour of approximate solution will differ though because the cost function is of the form D(G) - C(T), where D is a constant (depending on the input G) and C(T) is the cost (defined by Dasgupta) and T is the hierarchical solution. The authors show a constant approximation guarantee w.r.t. They complement this approximation guarantee by showing that this approximation guarantee is almost tight.



Hierarchical clustering with OWA-based linkages, the Lance-Williams formula, and dendrogram inversions

Gagolewski, Marek, Cena, Anna, James, Simon, Beliakov, Gleb

arXiv.org Machine Learning

Agglomerative hierarchical clustering based on Ordered Weighted Averaging (OWA) operators not only generalises the single, complete, and average linkages, but also includes intercluster distances based on a few nearest or farthest neighbours, trimmed and winsorised means of pairwise point similarities, amongst many others. We explore the relationships between the famous Lance-Williams update formula and the extended OWA-based linkages with weights generated via infinite coefficient sequences. Furthermore, we provide some conditions for the weight generators to guarantee the resulting dendrograms to be free from unaesthetic inversions.