cluster tree
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Graphons, mergeons, and so on!
Justin Eldridge, Mikhail Belkin, Yusu Wang
In this work we develop a theory of hierarchical clustering for graphs. Our modeling assumption is that graphs are sampled from a graphon, which is a powerful and general model for generating graphs and analyzing large networks. Graphons are a far richer class of graph models than stochastic blockmodels, the primary setting for recent progress in the statistical theory of graph clustering. We define what it means for an algorithm to produce the "correct" clustering, give sufficient conditions in which a method is statistically consistent, and provide an explicit algorithm satisfying these properties.
- North America > United States > Ohio (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The authors propose a novel approach for hierarchical clustering of multivariate data. They construct cluster trees by estimating minimum volume sets using the q-One-Class SVM, and evaluate their method on a synthetic data set and two real word applications. While their new method seems to perform better than other approaches based on density estimation, I am not convinced by the benefits in practical applicability as the authors did not compare their method to the most commonly used hierarchical clustering techniques (agglomerative clustering with average linkage/ward). Minor comment: Rather than splitting their data once in a training and test set, the authors should perform 10-fold/5-fold cross-validation for a more reliable estimation of the generalizability of their method.
- Overview (0.56)
- Research Report > Promising Solution (0.35)
Cluster Trees on Manifolds
We investigate the problem of estimating the cluster tree for a density $f$ supported on or near a smooth $d$-dimensional manifold $M$ isometrically embedded in $\mathbb{R}^D$. We study a $k$-nearest neighbor based algorithm recently proposed by Chaudhuri and Dasgupta. Under mild assumptions on $f$ and $M$, we obtain rates of convergence that depend on $d$ only but not on the ambient dimension $D$. We also provide a sample complexity lower bound for a natural class of clustering algorithms that use $D$-dimensional neighborhoods.
Approximating Hierarchical MV-sets for Hierarchical Clustering
The goal of hierarchical clustering is to construct a cluster tree, which can be viewed as the modal structure of a density. For this purpose, we use a convex optimization program that can efficiently estimate a family of hierarchical dense sets in high-dimensional distributions. We further extend existing graph-based methods to approximate the cluster tree of a distribution. By avoiding direct density estimation, our method is able to handle high-dimensional data more efficiently than existing density-based approaches. We present empirical results that demonstrate the superiority of our method over existing ones.
Bespoke multiresolution analysis of graph signals
Elefante, Giacomo, Giacchi, Gianluca, Multerer, Michael, Quizi, Jacopo
We present a novel framework for discrete multiresolution analysis of graph signals. The main analytical tool is the samplet transform, originally defined in the Euclidean framework as a discrete wavelet-like construction, tailored to the analysis of scattered data. The first contribution of this work is defining samplets on graphs. To this end, we subdivide the graph into a fixed number of patches, embed each patch into a Euclidean space, where we construct samplets, and eventually pull the construction back to the graph. This ensures orthogonality, locality, and the vanishing moments property with respect to properly defined polynomial spaces on graphs. Compared to classical Haar wavelets, this framework broadens the class of graph signals that can efficiently be compressed and analyzed. Along this line, we provide a definition of a class of signals that can be compressed using our construction. We support our findings with different examples of signals defined on graphs whose vertices lie on smooth manifolds. For efficient numerical implementation, we combine heavy edge clustering, to partition the graph into meaningful patches, with landmark \texttt{Isomap}, which provides low-dimensional embeddings for each patch. Our results demonstrate the method's robustness, scalability, and ability to yield sparse representations with controllable approximation error, significantly outperforming traditional Haar wavelet approaches in terms of compression efficiency and multiresolution fidelity.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > New York (0.04)
- North America > United States > Minnesota (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Comparison between neural network clustering, hierarchical clustering and k-means clustering: Applications using fluidic lenses
A comparison between neural network clustering (NNC), hierarchical clustering (HC) and K-means clustering (KMC) is performed to evaluate the computational superiority of these three machine learning (ML) techniques for organizing large datasets into clusters. For NNC, a self-organizing map (SOM) training was applied to a collection of wavefront sensor reconstructions, decomposed in terms of 15 Zernike coefficients, characterizing the optical aberrations of the phase front transmitted by fluidic lenses. In order to understand the distribution and structure of the 15 Zernike variables within an input space, SOM-neighboring weight distances, SOM-sample hits, SOM-weight positions and SOM-weight planes were analyzed to form a visual interpretation of the system's structural properties. In the case of HC, the data was partitioned using a combined dissimilarity-linkage matrix computation. The effectiveness of this method was confirmed by a high cophenetic correlation coefficient value (c=0.9651). Additionally, a maximum number of clusters was established by setting an inconsistency cutoff of 0.8, yielding a total of 7 clusters for system segmentation. In addition, a KMC approach was employed to establish a quantitative measure of clustering segmentation efficiency, obtaining a sillhoute average value of 0.905 for data segmentation into K=5 non-overlapping clusters. On the other hand, the NNC analysis revealed that the 15 variables could be characterized through the collective influence of 8 clusters. It was established that the formation of clusters through the combined linkage and dissimilarity algorithms of HC alongside KMC is a more dependable clustering solution than separate assessment via NNC or HC, where altering the SOM size or inconsistency cutoff can lead to completely new clustering configurations.
Approximating Hierarchical MV-sets for Hierarchical Clustering
Assaf Glazer, Omer Weissbrod, Michael Lindenbaum, Shaul Markovitch
The goal of hierarchical clustering is to construct a cluster tree, which can be viewed as the modal structure of a density. For this purpose, we use a convex optimization program that can efficiently estimate a family of hierarchical dense sets in high-dimensional distributions. We further extend existing graph-based methods to approximate the cluster tree of a distribution. By avoiding direct density estimation, our method is able to handle high-dimensional data more efficiently than existing density-based approaches. We present empirical results that demonstrate the superiority of our method over existing ones.
- North America > United States > Utah (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > Scotland (0.04)
- (7 more...)
Reviews: Flattening a Hierarchical Clustering through Active Learning
This paper derives complexity results for active learning queries to hierarchical clustering. The result is a partition or "cut", c, of the cluster tree, where the "flat" clustering is defined by the clusters at the leaves of a subtree of nodes AB(c) that have the same root as the original cluster tree. Learning occurs by making pairwise judgments on items (leaf nodes). All pairwise judgments form a "ground truth" matrix \Sigma. Given consistency conditions, this is an equivalent way to represent a clustering.