Goto

Collaborating Authors

 cluster tree




Graphons, mergeons, and so on!

Justin Eldridge, Mikhail Belkin, Yusu Wang

Neural Information Processing Systems

In this work we develop a theory of hierarchical clustering for graphs. Our modeling assumption is that graphs are sampled from a graphon, which is a powerful and general model for generating graphs and analyzing large networks. Graphons are a far richer class of graph models than stochastic blockmodels, the primary setting for recent progress in the statistical theory of graph clustering. We define what it means for an algorithm to produce the "correct" clustering, give sufficient conditions in which a method is statistically consistent, and provide an explicit algorithm satisfying these properties.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The authors propose a novel approach for hierarchical clustering of multivariate data. They construct cluster trees by estimating minimum volume sets using the q-One-Class SVM, and evaluate their method on a synthetic data set and two real word applications. While their new method seems to perform better than other approaches based on density estimation, I am not convinced by the benefits in practical applicability as the authors did not compare their method to the most commonly used hierarchical clustering techniques (agglomerative clustering with average linkage/ward). Minor comment: Rather than splitting their data once in a training and test set, the authors should perform 10-fold/5-fold cross-validation for a more reliable estimation of the generalizability of their method.


Cluster Trees on Manifolds

Neural Information Processing Systems

We investigate the problem of estimating the cluster tree for a density $f$ supported on or near a smooth $d$-dimensional manifold $M$ isometrically embedded in $\mathbb{R}^D$. We study a $k$-nearest neighbor based algorithm recently proposed by Chaudhuri and Dasgupta. Under mild assumptions on $f$ and $M$, we obtain rates of convergence that depend on $d$ only but not on the ambient dimension $D$. We also provide a sample complexity lower bound for a natural class of clustering algorithms that use $D$-dimensional neighborhoods.


Approximating Hierarchical MV-sets for Hierarchical Clustering

Neural Information Processing Systems

The goal of hierarchical clustering is to construct a cluster tree, which can be viewed as the modal structure of a density. For this purpose, we use a convex optimization program that can efficiently estimate a family of hierarchical dense sets in high-dimensional distributions. We further extend existing graph-based methods to approximate the cluster tree of a distribution. By avoiding direct density estimation, our method is able to handle high-dimensional data more efficiently than existing density-based approaches. We present empirical results that demonstrate the superiority of our method over existing ones.


Bespoke multiresolution analysis of graph signals

Elefante, Giacomo, Giacchi, Gianluca, Multerer, Michael, Quizi, Jacopo

arXiv.org Artificial Intelligence

We present a novel framework for discrete multiresolution analysis of graph signals. The main analytical tool is the samplet transform, originally defined in the Euclidean framework as a discrete wavelet-like construction, tailored to the analysis of scattered data. The first contribution of this work is defining samplets on graphs. To this end, we subdivide the graph into a fixed number of patches, embed each patch into a Euclidean space, where we construct samplets, and eventually pull the construction back to the graph. This ensures orthogonality, locality, and the vanishing moments property with respect to properly defined polynomial spaces on graphs. Compared to classical Haar wavelets, this framework broadens the class of graph signals that can efficiently be compressed and analyzed. Along this line, we provide a definition of a class of signals that can be compressed using our construction. We support our findings with different examples of signals defined on graphs whose vertices lie on smooth manifolds. For efficient numerical implementation, we combine heavy edge clustering, to partition the graph into meaningful patches, with landmark \texttt{Isomap}, which provides low-dimensional embeddings for each patch. Our results demonstrate the method's robustness, scalability, and ability to yield sparse representations with controllable approximation error, significantly outperforming traditional Haar wavelet approaches in terms of compression efficiency and multiresolution fidelity.


Comparison between neural network clustering, hierarchical clustering and k-means clustering: Applications using fluidic lenses

Puentes, Graciana

arXiv.org Artificial Intelligence

A comparison between neural network clustering (NNC), hierarchical clustering (HC) and K-means clustering (KMC) is performed to evaluate the computational superiority of these three machine learning (ML) techniques for organizing large datasets into clusters. For NNC, a self-organizing map (SOM) training was applied to a collection of wavefront sensor reconstructions, decomposed in terms of 15 Zernike coefficients, characterizing the optical aberrations of the phase front transmitted by fluidic lenses. In order to understand the distribution and structure of the 15 Zernike variables within an input space, SOM-neighboring weight distances, SOM-sample hits, SOM-weight positions and SOM-weight planes were analyzed to form a visual interpretation of the system's structural properties. In the case of HC, the data was partitioned using a combined dissimilarity-linkage matrix computation. The effectiveness of this method was confirmed by a high cophenetic correlation coefficient value (c=0.9651). Additionally, a maximum number of clusters was established by setting an inconsistency cutoff of 0.8, yielding a total of 7 clusters for system segmentation. In addition, a KMC approach was employed to establish a quantitative measure of clustering segmentation efficiency, obtaining a sillhoute average value of 0.905 for data segmentation into K=5 non-overlapping clusters. On the other hand, the NNC analysis revealed that the 15 variables could be characterized through the collective influence of 8 clusters. It was established that the formation of clusters through the combined linkage and dissimilarity algorithms of HC alongside KMC is a more dependable clustering solution than separate assessment via NNC or HC, where altering the SOM size or inconsistency cutoff can lead to completely new clustering configurations.


Approximating Hierarchical MV-sets for Hierarchical Clustering

Assaf Glazer, Omer Weissbrod, Michael Lindenbaum, Shaul Markovitch

Neural Information Processing Systems

The goal of hierarchical clustering is to construct a cluster tree, which can be viewed as the modal structure of a density. For this purpose, we use a convex optimization program that can efficiently estimate a family of hierarchical dense sets in high-dimensional distributions. We further extend existing graph-based methods to approximate the cluster tree of a distribution. By avoiding direct density estimation, our method is able to handle high-dimensional data more efficiently than existing density-based approaches. We present empirical results that demonstrate the superiority of our method over existing ones.


Reviews: Flattening a Hierarchical Clustering through Active Learning

Neural Information Processing Systems

This paper derives complexity results for active learning queries to hierarchical clustering. The result is a partition or "cut", c, of the cluster tree, where the "flat" clustering is defined by the clusters at the leaves of a subtree of nodes AB(c) that have the same root as the original cluster tree. Learning occurs by making pairwise judgments on items (leaf nodes). All pairwise judgments form a "ground truth" matrix \Sigma. Given consistency conditions, this is an equivalent way to represent a clustering.