Reviews: Subquadratic High-Dimensional Hierarchical Clustering
–Neural Information Processing Systems
This paper proposes a new approach to approximating hierarchical agglomerative clustering (HAC) by requiring that at each round, only a gamma-best merge be performed (gamma being the multiplicative approximation factor to the closest pair). Two algorithms are introduced to approximate HAC - one for Ward and one for Average linkage. In both cases, the algorithms rely on using approximate nearest neighbor ANN as a black box. In addition, a bucketing datastructure is used in Wards algorithm and a subsampling procedure in used for Average linkage to guarantee the subquadratic runtime. This is a new contribution to the theoretical literature on HAC, a provable subquadratic algorithm for (an approximation to) HAC cases other than single linkage.
Neural Information Processing Systems
Jan-27-2025, 11:27:33 GMT
- Technology: