Hierarchical Clustering Beyond the Worst-Case
Vincent Cohen-Addad, Varun Kanade, Frederik Mallmann-Trenn
–Neural Information Processing Systems
Hiererachical clustering, that is computing a recursive partitioning of a dataset to obtain clusters at increasingly finer granularity is a fundamental problem in data analysis. Although hierarchical clustering has mostly been studied through procedures such as linkage algorithms, or top-down heuristics, rather than as optimization problems, Dasgupta [9] recently proposed an objective function for hierarchical clustering and initiated a line of work developing algorithms that explicitly optimize an objective (see also [7, 22, 8]). In this paper, we consider a fairly general random graph model for hierarchical clustering, called the hierarchical stochastic block model (HSBM), and show that in certain regimes the SVD approach of McSherry [18] combined with specific linkage methods results in a clustering that give an Op1q approximation to Dasgupta's cost function. Finally, we report empirical evaluation on synthetic and real-world data showing that our proposed SVD-based method does indeed achieve a better cost than other widely-used heurstics and also results in a better classification accuracy when the underlying problem was that of multi-class classification.
Neural Information Processing Systems
Oct-4-2024, 09:23:00 GMT
- Country:
- Asia > Afghanistan
- Parwan Province > Charikar (0.05)
- Europe
- Denmark > Capital Region
- Copenhagen (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- Tyne and Wear > Sunderland (0.04)
- Denmark > Capital Region
- North America > United States
- California > Los Angeles County
- Long Beach (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- California > Los Angeles County
- South America > Paraguay
- Asia > Afghanistan
- Genre:
- Research Report (0.46)
- Technology: