Clustering
Stochastic Blockmodeling for Online Advertising
Chen, Li (Johns Hopkins University) | Patton, Matthew (AOL Advertising.com)
Online advertising is an important and huge industry. Having knowledge of the website attributes can contribute greatly to business strategies for ad-targeting, content display, inventory purchase or revenue prediction. In this paper, we introduce a stochastic blockmodeling for the website relations induced by the event of online user visitation. We propose two clustering algorithms to discover the intrinsic structures of websites, and compare the performance with a goodness-of-fit method and a deterministic graph partitioning method. We demonstrate the effectiveness of our algorithms on both simulation and AOL website dataset.
Integrating Image Clustering and Codebook Learning
Xie, Pengtao (Carnegie Mellon University) | Xing, Eric P. (Carnegie Mellon University)
Image clustering and visual codebook learning are two fundamental problems in computer vision and they are tightly related. On one hand, a good codebook can generate effective feature representations which largely affect clustering performance. On the other hand, class labels obtained from image clustering can serve as supervised information to guide codebook learning. Traditionally, these two processes are conducted separately and their correlation is generally ignored.In this paper, we propose a Double Layer Gaussian Mixture Model (DLGMM) to simultaneously perform image clustering and codebook learning. In DLGMM, two tasks are seamlessly coupled and can mutually promote each other. Cluster labels and codebook are jointly estimated to achieve the overall best performance. To incorporate the spatial coherence between neighboring visual patches, we propose a Spatially Coherent DLGMM which uses a Markov Random Field to encourage neighboring patches to share the same visual word label.We use variational inference to approximate the posterior of latent variables and learn model parameters.Experiments on two datasets demonstrate the effectiveness of two models.
Robust Subspace Clustering via Thresholding Ridge Regression
Peng, Xi (Institute for Infocomm Research) | Yi, Zhang (Sichuan University) | Tang, Huajin (Institute for Infocomm Research)
Given a data set from a union of multiple linear subspaces, a robust subspace clustering algorithm fits each group of data points with a low-dimensional subspace and then clusters these data even though they are grossly corrupted or sampled from the union of dependent subspaces. Under the framework of spectral clustering, recent works using sparse representation, low rank representation and their extensions achieve robust clustering results by formulating the errors (e.g., corruptions) into their objective functions so that the errors can be removed from the inputs. However, these approaches have suffered from the limitation that the structure of the errors should be known as the prior knowledge. In this paper, we present a new method of robust subspace clustering by eliminating the effect of the errors from the projection space (representation) rather than from the input space. We firstly prove that ell_1-, ell_2-, and ell_infty-norm-based linear projection spaces share the property of intra-subspace projection dominance, i.e., the coefficients over intra-subspace data points are larger than those over inter-subspace data points. Based on this property, we propose a robust and efficient subspace clustering algorithm, called Thresholding Ridge Regression (TRR). TRR calculates the ell2-norm-based coefficients of a given data set and performs a hard thresholding operator; and then the coefficients are used to build a similarity graph for clustering. Experimental studies show that TRR outperforms the state-of-the-art methods with respect to clustering quality, robustness, and time-saving.
SoF: Soft-Cluster Matrix Factorization for Probabilistic Clustering
Zhao, Han (University of Waterloo) | Poupart, Pascal (University of Waterloo) | Zhang, Yongfeng (Tsinghua University) | Lysy, Martin (University of Waterloo)
We propose SoF (Soft-cluster matrix Factorization), a probabilistic clustering algorithm which softly assigns each data point into clusters. Unlike model-based clustering algorithms, SoF does not make assumptions about the data density distribution. Instead, we take an axiomatic approach to define 4 properties that the probability of co-clustered pairs of points should satisfy. Based on the properties, SoF utilizes a distance measure between pairs of points to induce the conditional co-cluster probabilities. The objective function in our framework establishes an important connection between probabilistic clustering and constrained symmetric Nonnegative Matrix Factorization (NMF), hence providing a theoretical interpretation for NMF-based clustering algorithms. To optimize the objective, we derive a sequential minimization algorithm using a penalty method. Experimental results on both synthetic and real-world datasets show that SoF significantly outperforms previous NMF-based algorithms and that it is able to detect non-convex patterns as well as cluster boundaries.
Constrained NMF-Based Multi-View Clustering on Unmapped Data
Zhang, Xianchao (Dalian University of Technology) | Zong, Linlin (Dalian University of Technology) | Liu, Xinyue (Dalian University of Technology) | Yu, Hong (Dalian University of Technology)
We use the disagreement between the Multi-view clustering gains increasing attention in the past views to guide the factorization of the matrices. The overall decade (Bickel and Scheffer 2004) (Kumar and III 2011) objective of our algorithm is to minimize the loss function of (Kumar, Rai, and III 2011) (Liu et al. 2013) (Blaschko and NMF in each view as well as the disagreement between each Lampert 2008) (Chaudhuri et al. 2009) (Tzortzis and Likas pair of views. Experimental results show that, with a small 2012). Most existing multi-view clustering algorithms require number of constraints, the proposed CMVNMF (Constrained that the data is completely mapped, i.e., every object Multi-View clustering based on NMF) algorithm gets good has representations in all the views, representations from different performance on unmapped data, and outperforms existing views representing a same object are exactly known, algorithms on partially mapped data and completely mapped and the representations of the same object have the same data.
Learning Robust Locality Preserving Projection via p-Order Minimization
Wang, Hua (Colorado School of Mines) | Nie, Feiping (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington)
Locality preserving projection (LPP) is an effective dimensionality reduction method based on manifold learning, which is defined over the graph weighted squared L2-norm distances in the projected subspace. Since squared L2-norm distance is prone to outliers, it is desirable to develop a robust LPP method. In this paper, motivated by existing studies that improve the robustness of statistical learning models via L1-norm or not-squared L2-norm formulations, we propose a robust LPP (rLPP) formulation to minimize the p-th order of the L2-norm distances, which can better tolerate large outlying data samples because it suppress the introduced biased more than the L1-norm or not squared L2-norm minimizations. However, solving the formulated objective is very challenging because it not only non-smooth but also non-convex. As an important theoretical contribution of this work, we systematically derive an efficient iterative algorithm to solve the general p-th order L2-norm minimization problem, which, to the best of our knowledge, is solved for the first time in literature. Extensive empirical evaluations on the proposed rLPP method have been performed, in which our new method outperforms the related state-of-the-art methods in a variety of experimental settings and demonstrate its effectiveness in seeking better subspaces on both noiseless and noisy data.
Unidimensional Clustering of Discrete Data Using Latent Tree Models
Liu, April H. (The Hong Kong University of Science and Technology) | Poon, Leonard K.M. ( The Hong Kong Institute of Education ) | Zhang, Nevin L. (The Hong Kong University of Science and Technology)
This paper is concerned with model-based clustering of discrete data. Latent class models (LCMs) are usually used for the task. An LCM consists of a latent variable and a number of attributes. It makes the overly restrictive assumption that the attributes are mutually independent given the latent variable. We propose a novel method to relax the assumption. The key idea is to partition the attributes into groups such that correlations among the attributes in each group can be properly modeled by using one single latent variable. The latent variables for the attribute groups are then used to build a number of models and one of them is chosen to produce the clustering results. Extensive empirical studies have been conducted to compare the new method with LCM and several other methods (K-means, kernel K-means and spectral clustering) that are not model-based. The new method outperforms the alternative methods in most cases and the differences are often large.
Large-Scale Multi-View Spectral Clustering via Bipartite Graph
Li, Yeqing (University of Texas at Arlington) | Nie, Feiping (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington) | Huang, Junzhou (University of Texas at Arlington)
In this paper, we address the problem of large-scale multi-view spectral clustering. In many real-world applications, data can be represented in various heterogeneous features or views. Different views often provide different aspects of information that are complementary to each other. Several previous methods of clustering have demonstrated that better accuracy can be achieved using integrated information of all the views than just using each view individually. One important class of such methods is multi-view spectral clustering, which is based on graph Laplacian. However, existing methods are not applicable to large-scale problem for their high computational complexity. To this end, we propose a novel large-scale multi-view spectral clustering approach based on the bipartite graph. Our method uses local manifold fusion to integrate heterogeneous features. To improve efficiency, we approximate the similarity graphs using bipartite graphs. Furthermore, we show that our method can be easily extended to handle the out-of-sample problem. Extensive experimental results on five benchmark datasets demonstrate the effectiveness and efficiency of the proposed method, where our method runs up to nearly 3000 times faster than the state-of-the-art methods.
Maximin Separation Probability Clustering
Huang, Gao (Tsinghua University) | Zhang, Jianwen (Microsoft) | Song, Shiji (Tsinghua University) | Chen, Zheng (Microsoft)
This paper proposes a new approach for discriminative clustering. The intuition is, for a good clustering, one should be able to learn a classifier from the clustering labels with high generalization accuracy. Thus we define a novel metric to evaluate the quality of a clustering labeling, named Minimum Separation Probability (MSP), which is a lower bound of the generalization accuracy of a classifier learnt from the clustering labeling. We take MSP as the objective to maximize and propose our approach Maximin Separation Probability Clustering (MSPC), which has several attractive properties, such as invariance to anisotropic feature scaling and intuitive probabilistic explanation for clustering quality. We present three efficient optimization strategies for MSPC, and analyze their interesting connections to existing clustering approaches, such as maximum margin clustering (MMC) and discriminative k-means. Empirical results on real world data sets verify that MSP is a robust and effective clustering quality measure. It is also shown that the proposed algorithms compare favorably to state-of-the-art clustering algorithms in both accuracy and efficiency.
A Convex Formulation for Spectral Shrunk Clustering
Chang, Xiaojun (University of Technology Sydney) | Nie, Feiping (University of Texas at Arlington) | Ma, Zhigang (Carnegie Mellon University) | Yang, Yi (University of Technology Sydney) | Zhou, Xiaofang (The University of Queensland)
Spectral clustering is a fundamental technique in the field of data mining and information processing. Most existing spectral clustering algorithms integrate dimensionality reduction into the clustering process assisted by manifold learning in the original space. However, the manifold in reduced-dimensional subspace is likely to exhibit altered properties in contrast with the original space. Thus, applying manifold information obtained from the original space to the clustering process in a low-dimensional subspace is prone to inferior performance. Aiming to address this issue, we propose a novel convex algorithm that mines the manifold structure in the low-dimensional subspace. In addition, our unified learning process makes the manifold learning particularly tailored for the clustering. Compared with other related methods, the proposed algorithm results in more structured clustering result. To validate the efficacy of the proposed algorithm, we perform extensive experiments on several benchmark datasets in comparison with some state-of-the-art clustering approaches. The experimental results demonstrate that the proposed algorithm has quite promising clustering performance.