Goto

Collaborating Authors

 Clustering


Dissimilarity Mixture Autoencoder for Deep Clustering

arXiv.org Machine Learning

In this paper, we introduce the Dissimilarity Mixture Autoencoder (DMAE), a novel neural network model that uses a dissimilarity function to generalize a family of density estimation and clustering methods. It is formulated in such a way that it internally estimates the parameters of a probability distribution through gradient-based optimization. Also, the proposed model can leverage from deep representation learning due to its straightforward incorporation into deep learning architectures, because, it consists of an encoder-decoder network that computes a probabilistic representation. Experimental evaluation was performed on image and text clustering benchmark datasets showing that the method is competitive in terms of unsupervised classification accuracy and normalized mutual information. The source code to replicate the experiments is publicly available at https://github.com/larajuse/DMAE


BETULA: Numerically Stable CF-Trees for BIRCH Clustering

arXiv.org Machine Learning

BIRCH clustering is a widely known approach for clustering, that has influenced much subsequent research and commercial products. The key contribution of BIRCH is the Clustering Feature tree (CF-Tree), which is a compressed representation of the input data. As new data arrives, the tree is eventually rebuilt to increase the compression. Afterward, the leaves of the tree are used for clustering. Because of the data compression, this method is very scalable. The idea has been adopted for example for k-means, data stream, and density-based clustering. Clustering features used by BIRCH are simple summary statistics that can easily be updated with new data: the number of points, the linear sums, and the sum of squared values. Unfortunately, how the sum of squares is then used in BIRCH is prone to catastrophic cancellation. We introduce a replacement cluster feature that does not have this numeric problem, that is not much more expensive to maintain, and which makes many computations simpler and hence more efficient. These cluster features can also easily be used in other work derived from BIRCH, such as algorithms for streaming data. In the experiments, we demonstrate the numerical problem and compare the performance of the original algorithm compared to the improved cluster features.


Revealing consensus and dissensus between network partitions

arXiv.org Machine Learning

Community detection methods attempt to divide a network into groups of nodes that share similar properties, thus revealing its large-scale structure. A major challenge when employing such methods is that they are often degenerate, typically yielding a complex landscape of competing answers. As an attempt to extract understanding from a population of alternative solutions, many methods exist to establish a consensus among them in the form of a single partition "point estimate" that summarizes the whole distribution. Here we show that it is in general not possible to obtain a consistent answer from such point estimates when the underlying distribution is too heterogeneous. As an alternative, we provide a comprehensive set of methods designed to characterize and summarize complex populations of partitions in a manner that captures not only the existing consensus, but also the dissensus between elements of the population. Our approach is able to model mixed populations of partitions where multiple consensuses can coexist, representing different competing hypotheses for the network structure. We also show how our methods can be used to compare pairs of partitions, how they can be generalized to hierarchical divisions, and be used to perform statistical model selection between competing hypotheses.


Distributional Individual Fairness in Clustering

arXiv.org Machine Learning

In this paper, we initiate the study of fair clustering that ensures distributional similarity among similar individuals. In response to improving fairness in machine learning, recent papers have investigated fairness in clustering algorithms and have focused on the paradigm of statistical parity/group fairness. These efforts attempt to minimize bias against some protected groups in the population. However, to the best of our knowledge, the alternative viewpoint of individual fairness, introduced by Dwork et al. (ITCS 2012) in the context of classification, has not been considered for clustering so far. Similar to Dwork et al., we adopt the individual fairness notion which mandates that similar individuals should be treated similarly for clustering problems. We use the notion of $f$-divergence as a measure of statistical similarity that significantly generalizes the ones used by Dwork et al. We introduce a framework for assigning individuals, embedded in a metric space, to probability distributions over a bounded number of cluster centers. The objective is to ensure (a) low cost of clustering in expectation and (b) individuals that are close to each other in a given fairness space are mapped to statistically similar distributions. We provide an algorithm for clustering with $p$-norm objective ($k$-center, $k$-means are special cases) and individual fairness constraints with provable approximation guarantee. We extend this framework to include both group fairness and individual fairness inside the protected groups. Finally, we observe conditions under which individual fairness implies group fairness. We present extensive experimental evidence that justifies the effectiveness of our approach.


An Efficient Smoothing Proximal Gradient Algorithm for Convex Clustering

arXiv.org Machine Learning

Cluster analysis organizes data into sensible groupings and is one of fundamental modes of understanding and learning. The widely used K-means and hierarchical clustering methods can be dramatically suboptimal due to local minima. Recently introduced convex clustering approach formulates clustering as a convex optimization problem and ensures a globally optimal solution. However, the state-of-the-art convex clustering algorithms, based on the alternating direction method of multipliers (ADMM) or the alternating minimization algorithm (AMA), require large computation and memory space, which limits their applications. In this paper, we develop a very efficient smoothing proximal gradient algorithm (Sproga) for convex clustering. Our Sproga is faster than ADMM- or AMA-based convex clustering algorithms by one to two orders of magnitude. The memory space required by Sproga is less than that required by ADMM and AMA by at least one order of magnitude. Computer simulations and real data analysis show that Sproga outperforms several well known clustering algorithms including K-means and hierarchical clustering. The efficiency and superior performance of our algorithm will help convex clustering to find its wide application.


A Multiscale Graph Convolutional Network Using Hierarchical Clustering

arXiv.org Machine Learning

The information contained in hierarchical topology, intrinsic to many networks, is currently underutilised. A novel architecture is explored which exploits this information through a multiscale decomposition. A dendrogram is produced by a Girvan-Newman hierarchical clustering algorithm. It is segmented and fed through graph convolutional layers, allowing the architecture to learn multiple scale latent space representations of the network, from fine to coarse grained. The architecture is tested on a benchmark citation network, demonstrating competitive performance. Given the abundance of hierarchical networks, possible applications include quantum molecular property prediction, protein interface prediction and multiscale computational substrates for partial differential equations.


Information Mandala: Statistical Distance Matrix with Clustering

arXiv.org Machine Learning

In machine learning, observation features are measured in a metric space to obtain their distance function for optimization. Given similar features that are statistically sufficient as a population, a statistical distance between two probability distributions can be calculated for more precise learning. Provided the observed features are multi-valued, the statistical distance function is still efficient. However, due to its scalar output, it cannot be applied to represent detailed distances between feature elements. To resolve this problem, this paper extends the traditional statistical distance to a matrix form, called a statistical distance matrix. In experiments, the proposed approach performs well in object recognition tasks and clearly and intuitively represents the dissimilarities between cat and dog images in the CIFAR dataset, even when directly calculated using the image pixels. By using the hierarchical clustering of the statistical distance matrix, the image pixels can be separated into several clusters that are geometrically arranged around a center like a Mandala pattern. The statistical distance matrix with clustering, called the Information Mandala, is beyond ordinary saliency maps and can help to understand the basic principles of the convolution neural network.


Subspace Clustering for Action Recognition with Covariance Representations and Temporal Pruning

arXiv.org Artificial Intelligence

Despite the fact that subspace clustering has become a powerful Given a trimmed sequence, in which a single action or activity technique for problems such as face clustering or digit is assumed to be present, the final goal of HAR is to correctly recognition, its applicability to the problems like skeletonbased classifying it. Although significant progresses have been made HAR was only explored by a limited number of works in the last years, accurate action recognition in videos is still a [7], [8], [9]. This is due to many operative limitations including challenging task because of the complexity of the visual data how to handle the temporal dimensions, the inherent noise e.g., due to varying camera viewpoints, occlusions and abrupt present in the skeletal data and the related computational changes in lighting conditions.


Fair clustering via equitable group representations

arXiv.org Machine Learning

What does it mean for a clustering to be fair? One popular approach seeks to ensure that each cluster contains groups in (roughly) the same proportion in which they exist in the population. The normative principle at play is balance: any cluster might act as a representative of the data, and thus should reflect its diversity. But clustering also captures a different form of representativeness. A core principle in most clustering problems is that a cluster center should be representative of the cluster it represents, by being "close" to the points associated with it. This is so that we can effectively replace the points by their cluster centers without significant loss in fidelity, and indeed is a common "use case" for clustering. For such a clustering to be fair, the centers should "represent" different groups equally well. We call such a clustering a group-representative clustering. In this paper, we study the structure and computation of group-representative clusterings. We show that this notion naturally parallels the development of fairness notions in classification, with direct analogs of ideas like demographic parity and equal opportunity. We demonstrate how these notions are distinct from and cannot be captured by balance-based notions of fairness. We present approximation algorithms for group representative $k$-median clustering and couple this with an empirical evaluation on various real-world data sets.


On identifying clusters from sum-of-norms clustering computation

arXiv.org Machine Learning

Sum-of-norms clustering is a clustering formulation based on convex optimization that automatically induces hierarchy. Multiple algorithms have been proposed to solve the optimization problem: subgradient descent by Hocking et al.\ \cite{hocking}, ADMM and ADA by Chi and Lange\ \cite{Chi}, stochastic incremental algorithm by Panahi et al.\ \cite{Panahi} and semismooth Newton-CG augmented Lagrangian method by Yuan et al.\ \cite{dsun1}. All algorithms yield approximate solutions, even though an exact solution is demanded to determine the correct cluster assignment. The purpose of this paper is to close the gap between the output from existing algorithms and the exact solution to the optimization problem. We present a clustering test which identifies and certifies the correct cluster assignment from an approximate solution yielded by any primal-dual algorithm. The test may not succeed if the approximation is inaccurate. However, we show the correct cluster assignment is guaranteed to be found by a symmetric primal-dual path following algorithm after sufficiently many iterations, provided that the model parameter $\lambda$ avoids a finite number of bad values. Numerical experiments are implemented to support our results.