Unsupervised Embedding of Hierarchical Structure in Euclidean Space
Zhao, Jinyu, Hao, Yi, Rashtchian, Cyrus
Hierarchical clustering aims to find an iterative grouping of increasingly large subsets of data, terminating when one cluster contains all of the data. This results in a tree structure, where each level determines a partition of the data. Organizing the data in such a way has many applications, including automated taxonomy construction and phylogeny reconstruction [6, 26, 54, 69, 87]. Motivated by these applications, we address the simultaneous goals of recovering an underlying clustering and building a hierarchical tree of the entire dataset. We imagine that we have access to many samples from each individual cluster (e.g., images of plants and animals), while we lack labels or any direct information about the hierarchical relationships. In this scenario, our objective is to correctly cluster the samples in each group and also build a dendrogram within and on top of the clusters that matches the natural supergroups. Despite significant effort, hierarchical clustering remains a challenging task, theoretically and empirically. Many objectives are NP-Hard [13, 14, 18, 19, 35, 59], and many theoretical algorithms may not be practical because they require solving a hard subproblem (e.g., sparsest cut) at each step [2, 4, 13, 14, 19, 51].
Oct-29-2020
- Country:
- North America
- Canada > Ontario
- Toronto (0.14)
- United States
- California (0.14)
- New York (0.14)
- Canada > Ontario
- North America
- Genre:
- Research Report (0.64)
- Technology: