Clustering is an old research topic in data mining and machine learning communities. Most of the traditional clustering methods can be categorized local or global ones. In this paper, a novel clustering method that can explore both the local and global information in the dataset is proposed. The method, Clustering with Local and Global Consistency (CLGR), aims to minimize a cost function that properly trades off the local and global costs. We will show that such an optimization problem can be solved by the eigenvalue decomposition of a sparse symmetric matrix, which can be done efficiently by some iterative methods. Finally the experimental results on several datasets are presented to show the effectiveness of our method.

Nie, Feiping (Tsinghua University) | Xu, Dong (Nanyang Technological University) | Tsang, Ivor Wai-Hung (Nanyang Technological University) | Zhang, Changshui (Tsinghua University)

In this paper, we propose a new spectral clustering method, referred to as Spectral Embedded Clustering (SEC), to minimize the normalized cut criterion in spectral clustering as well as control the mismatch between the cluster assignment matrix and the low dimensional embedded representation of the data. SEC is based on the observation that the cluster assignment matrix of high dimensional data can be represented by a low dimensional linear mapping of data. We also discover the connection between SEC and other clustering methods, such as spectral clustering, Clustering with local and global regularization, K-means and Discriminative K-means. The experiments on many real-world data sets show that SEC significantly outperforms the existing spectral clustering methods as well as K-means clustering related methods.

We show that k-means (Lloyd's algorithm) is equivalent to a variational EM approximation of a Gaussian Mixture Model (GMM) with isotropic Gaussians. The k-means algorithm is obtained if truncated posteriors are used as variational distributions. In contrast to the standard way to relate k-means and GMMs, we show that it is not required to consider the limit case of Gaussians with zero variance. There are a number of consequences following from our observation: (A) k-means can be shown to monotonously increase the free-energy associated with truncated distributions; (B) Using the free-energy, we can derive an explicit and compact formula of a lower GMM likelihood bound which uses the k-means objective as argument; (C) We can generalize k-means using truncated variational EM, and relate such generalizations to other k-means-like algorithms. In general, truncated variational EM provides a natural and quantitative link between k-means-like clustering and GMM clustering algorithms which may be very relevant for future theoretical as well as empirical studies.

K Means Clustering is an unsupervised learning algorithm that tries to cluster data based on their similarity. Unsupervised learning means that there is no outcome to be predicted, and the algorithm just tries to find patterns in the data. In'k' means clustering, we have the specify the number of clusters we want the data to be grouped into. The algorithm randomly assigns each observation to a cluster, and finds the centroid of each cluster. These two steps are repeated till the within cluster variation cannot be reduced any further.

Jitta, Aditya (University of Helsinki) | Klami, Arto (University of Helsinki)

Classical model-based partitional clustering algorithms, such ask-means or mixture of Gaussians, provide only loose and indirect control over the size of the resulting clusters. In this work, we present a family of probabilistic clustering models that can be steered towards clusters of desired size by providing a prior distribution over the possible sizes, allowing the analyst to fine-tune exploratory analysis or to produce clusters of suitable size for future down-stream processing.Our formulation supports arbitrary multimodal prior distributions, generalizing the previous work on clustering algorithms searching for clusters of equal size or algorithms designed for the microclustering task of finding small clusters. We provide practical methods for solving the problem, using integer programming for making the cluster assignments, and demonstrate that we can also automatically infer the number of clusters.