Hierarchical Clustering with Structural Constraints
Chatziafratis, Vaggos, Niazadeh, Rad, Charikar, Moses
–arXiv.org Artificial Intelligence
Hierarchical clustering (HC) is a widely used data analysis tool, ubiquitous in information retrieval, data mining, and machine learning (see a survey by Berkhin [2006]). This clustering technique represents a given dataset as a binary tree; each leaf represents an individual data point and each internal node represents a cluster on the leaves of its descendants. HC has become the most popular method for gene expression data analysis Eisen et al. [1998], and also has been used in the analysis of social networks Leskovec et al. [2014], Mann et al. [2008], bioinformatics Diez et al. [2015], image and text classification Steinbach et al. [2000], and even in analysis of financial markets Tumminello et al. [2010]. It is attractive because it provides richer information at all levels of granularity simultaneously, compared to more traditional flat clustering approaches like k-means or k-median. Recently, Dasgupta [2016] formulated HC as a combinatorial optimization problem, giving a principled way to compare the performance of different HC algorithms. This optimization viewpoint has since received a lot of attention Roy and Pokutta [2016], Charikar and Chatziafratis [2017], Cohen-Addad et al. [2017], Moseley and Wang [2017], Cohen-Addad et al. [2018] that has led not only to the development of new algorithms but also to theoretical justifications for the observed success of popular HC algorithms (e.g.
arXiv.org Artificial Intelligence
May-23-2018
- Country:
- Asia > Afghanistan
- Parwan Province > Charikar (0.25)
- Europe
- Portugal > Setubal
- Setubal (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Portugal > Setubal
- Asia > Afghanistan
- Genre:
- Research Report (0.64)
- Industry:
- Technology: