Subquadratic High-Dimensional Hierarchical Clustering
–Neural Information Processing Systems
We consider the widely-used average-linkage, single-linkage, and Ward's methods for computing hierarchical clusterings of high-dimensional Euclidean inputs. It is easy to show that there is no efficient implementation of these algorithms in high dimensional Euclidean space since it implicitly requires to solve the closest pair problem, a notoriously difficult problem. However, how fast can these algorithms be implemented if we allow approximation? More precisely: these algorithms successively merge the clusters that are at closest average (for average-linkage), minimum distance (for single-linkage), or inducing the least sum-of-square error (for Ward's). We ask whether one could obtain a significant running-time improvement if the algorithm can merge \gamma -approximate closest clusters (namely, clusters that are at distance (average, minimum, or sum-of-square error) at most \gamma times the distance of the closest clusters).
Neural Information Processing Systems
Oct-11-2024, 01:43:56 GMT
- Technology: