Clustering
Multiple Flat Projections for Cross-manifold Clustering
Bai, Lan, Shao, Yuan-Hai, Chen, Wei-Jie, Wang, Zhen, Deng, Nai-Yang
Cross-manifold clustering is a hard topic and many traditional clustering methods fail because of the cross-manifold structures. In this paper, we propose a Multiple Flat Projections Clustering (MFPC) to deal with cross-manifold clustering problems. In our MFPC, the given samples are projected into multiple subspaces to discover the global structures of the implicit manifolds. Thus, the cross-manifold clusters are distinguished from the various projections. Further, our MFPC is extended to nonlinear manifold clustering via kernel tricks to deal with more complex cross-manifold clustering. A series of non-convex matrix optimization problems in MFPC are solved by a proposed recursive algorithm. The synthetic tests show that our MFPC works on the cross-manifold structures well. Moreover, experimental results on the benchmark datasets show the excellent performance of our MFPC compared with some state-of-the-art clustering methods.
Dip-means: an incremental clustering method for estimating the number of clusters
Kalogeratos, Argyris, Likas, Aristidis
Learning the number of clusters is a key problem in data clustering. We present dip-means, a novel robust incremental method to learn the number of data clusters that may be used as a wrapper around any iterative clustering algorithm of the k-means family. In contrast to many popular methods which make assumptions about the underlying cluster distributions, dip-means only assumes a fundamental cluster property: each cluster to admit a unimodal distribution. The proposed algorithm considers each cluster member as a ''viewer'' and applies a univariate statistic hypothesis test for unimodality (dip-test) on the distribution of the distances between the viewer and the cluster members. Two important advantages are: i) the unimodality test is applied on univariate distance vectors, ii) it can be directly applied with kernel-based methods, since only the pairwise distances are involved in the computations. Experimental results on artificial and real datasets indicate the effectiveness of our method and its superiority over analogous approaches.
Optimal Scoring for Unsupervised Learning
We are often interested in casting classification and clustering problems in a regression framework, because it is feasible to achieve some statistical properties in this framework by imposing some penalty criteria. In this paper we illustrate optimal scoring, which was originally proposed for performing Fisher linear discriminant analysis by regression, in the application of unsupervised learning. In particular, we devise a novel clustering algorithm that we call optimal discriminant clustering (ODC). Thus, our work shows that optimal scoring provides a new approach to the implementation of unsupervised learning. Papers published at the Neural Information Processing Systems Conference.
Regularized Co-Clustering with Dual Supervision
Sindhwani, Vikas, Hu, Jianying, Mojsilovic, Aleksandra
By attempting to simultaneously partition both the rows (examples) and columns (features) of a data matrix, Co-clustering algorithms often demonstrate surpris- ingly impressive performance improvements over traditional one-sided (row) clustering techniques. A good clustering of features may be seen as a combinatorial transformation of the data matrix, effectively enforcing a form of regularization that may lead to a better clustering of examples (and vice-versa). In many applications, partial supervision in the form of a few row labels as well as column labels may be available to potentially assist co-clustering. In this paper, we develop two novel semi-supervised multi-class classification algorithms motivated respectively by spectral bipartite graph partitioning and matrix approximation (e.g., non-negative matrix factorization) formulations for co-clustering. These algorithms (i) support dual supervision in the form of labels for both examples and/or features, (ii) provide principled predictive capability on out-of-sample test data, and (iii) arise naturally from the classical Representer theorem applied to regularization problems posed on a collection of Reproducing Kernel Hilbert Spaces.
On the Reliability of Clustering Stability in the Large Sample Regime
Clustering stability is an increasingly popular family of methods for performing model selection in data clustering. The basic idea is that the chosen model should be stable under perturbation or resampling of the data. Despite being reasonably effective in practice, these methods are not well understood theoretically, and present some difficulties. In particular, when the data is assumed to be sampled from an underlying distribution, the solutions returned by the clustering algorithm will usually become more and more stable as the sample size increases. This raises a potentially serious practical difficulty with these methods, because it means there might be some hard-to-compute sample size, beyond which clustering stability estimators'break down' and become unreliable in detecting the most stable model. Namely, all models will be relatively stable, with differences in their stability measures depending mostly on random and meaningless sampling artifacts.
A Game-Theoretic Approach to Hypergraph Clustering
Bulò, Samuel R., Pelillo, Marcello
Hypergraph clustering refers to the process of extracting maximally coherent groups from a set of objects using high-order (rather than pairwise) similarities. Traditional approaches to this problem are based on the idea of partitioning the input data into a user-defined number of classes, thereby obtaining the clusters as a by-product of the partitioning process. In this paper, we provide a radically different perspective to the problem. In contrast to the classical approach, we attempt to provide a meaningful formalization of the very notion of a cluster and we show that game theory offers an attractive and unexplored perspective that serves well our purpose. Specifically, we show that the hypergraph clustering problem can be naturally cast into a non-cooperative multi-player clustering game, whereby the notion of a cluster is equivalent to a classical game-theoretic equilibrium concept. From the computational viewpoint, we show that the problem of finding the equilibria of our clustering game is equivalent to locally optimizing a polynomial function over the standard simplex, and we provide a discrete-time dynamics to perform this optimization.
Minimum Average Cost Clustering
Nagano, Kiyohito, Kawahara, Yoshinobu, Iwata, Satoru
A number of objective functions in clustering problems can be described with submodular functions. In this paper, we introduce the minimum average cost criterion, and show that the theory of intersecting submodular functions can be used for clustering with submodular objective functions. The proposed algorithm does not require the number of clusters in advance, and it will be determined by the property of a given set of data points. The minimum average cost clustering problem is parameterized with a real variable, and surprisingly, we show that all information about optimal clusterings for all parameters can be computed in polynomial time in total. Additionally, we evaluate the performance of the proposed algorithm through computational experiments.
Efficient Optimization for Discriminative Latent Class Models
Joulin, Armand, Ponce, Jean, Bach, Francis R.
Dimensionality reduction is commonly used in the setting of multi-label supervised classification to control the learning capacity and to provide a meaningful representation of the data. We introduce a simple forward probabilistic model which is a multinomial extension of reduced rank regression; we show that this model provides a probabilistic interpretation of discriminative clustering methods with added benefits in terms of number of hyperparameters and optimization. While expectation-maximization (EM) algorithm is commonly used to learn these models, its optimization usually leads to local minimum because it relies on a non-convex cost function with many such local minima. To avoid this problem, we introduce a local approximation of this cost function, which leads to a quadratic non-convex optimization problem over a product of simplices. In order to minimize such functions, we propose an efficient algorithm based on convex relaxation and low-rank representation of our data, which allows to deal with large instances.
Unsupervised Feature Selection for the k-means Clustering Problem
Boutsidis, Christos, Drineas, Petros, Mahoney, Michael W.
We present a novel feature selection algorithm for the $k$-means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter $\epsilon \in (0,1)$, selects and appropriately rescales in an unsupervised manner $\Theta(k \log(k / \epsilon) / \epsilon 2)$ features from a dataset of arbitrary dimensions. We prove that, if we run any $\gamma$-approximate $k$-means algorithm ($\gamma \geq 1$) on the features selected using our method, we can find a $(1 (1 \epsilon)\gamma)$-approximate partition with high probability. Papers published at the Neural Information Processing Systems Conference.
Random Projections for k-means Clustering
Boutsidis, Christos, Zouzias, Anastasios, Drineas, Petros
We prove that any set of $n$ points in $d$ dimensions (rows in a matrix $A \in \RR {n \times d}$) can be projected into $t \Omega(k / \eps 2)$ dimensions, for any $\eps \in (0,1/3)$, in $O(n d \lceil \eps {-2} k/ \log(d) \rceil)$ time, such that with constant probability the optimal $k$-partition of the point set is preserved within a factor of $2 \eps$. The projection is done by post-multiplying $A$ with a $d \times t$ random matrix $R$ having entries $ 1/\sqrt{t}$ or $-1/\sqrt{t}$ with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results. Papers published at the Neural Information Processing Systems Conference.