Guarantees for Hierarchical Clustering by the Sublevel Set method

Meila, Marina

arXiv.org Machine Learning 

Compared to (simple) clustering data into K clusters, hierarchical clustering is much more complex and much less understood. One of the few seminal advances in hierarchical clusterings is the introduction by Dasgupta (2016) of a general yet simple paradigm of hierarchical clustering as loss minimization. This paradigm was expanded by Charikar and Chatziafratis (2016) and Roy and Pokutta (2016). The latter work also introduces a new set of techniques for obtaining hierarchical clusterings by showing that optimizing the loss can be relaxed to a Linear Program (LP). This paper introduces the first method to obtain optimality guarantees in the context of hierarchical clustering. Specifically, it is shown that the Sublevel Set (SS) paradigm invented by Meila (2018) for simple, nonhiearchical clustering, can be extended as well to hierarchical clustering. The main contribution is show that there is a natural distance between hierarchical clusterings whose properties can be exploited in the setting of the SS problem we will present in Section 3. The Sublevel Set method produces stability theorems of the following form.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found