Clustering
Learning Transferable Subspace for Human Motion Segmentation
Wang, Lichen (Northeastern University) | Ding, Zhengming (Northeastern University) | Fu, Yun (Northeastern University)
Temporal data clustering is a challenging task. Existing methods usually explore data self-representation strategy, which may hinder the clustering performance in insufficient or corrupted data scenarios. In real-world applications, we are easily accessible to a large amount of related labeled data. To this end, we propose a novel transferable subspace clustering approach by exploring useful information from relevant source data to enhance clustering performance in target temporal data. We manage to transform the original data into a shared low-dimensional and distinctive feature space by jointly seeking an effective domain-invariant projection. In this way, the well-labeled source knowledge can help obtain a more discriminative target representation. Moreover, a graph regularizer is designed to incorporate temporal information to preserve more sequence knowledge into the learned representation. Extensive experiments based on three human motion datasets illustrate that our approach is able to outperform state-of-the-art temporal data clustering methods.
Consistent and Specific Multi-View Subspace Clustering
Luo, Shirui (Chinese Academy of Sciences) | Zhang, Changqing (University of Chinese Academy of Sciences; Institute of Information Engineering; School of Cyber Security) | Zhang, Wei (Tianjin University) | Cao, Xiaochun (Chinese Academy of Sciences; Institute of Information Engineering)
Multi-view clustering has attracted intensive attention due to the effectiveness of exploiting multiple views of data. However, most existing multi-view clustering methods only aim to explore the consistency or enhance the diversity of different views. In this paper, we propose a novel multi-view subspace clustering method (CSMSC), where consistency and specificity are jointly exploited for subspace representation learning. We formulate the multi-view self-representation property using a shared consistent representation and a set of specific representations, which better fits the real-world datasets. Specifically, consistency models the common properties among all views, while specificity captures the inherent difference in each view. In addition, to optimize the non-convex problem, we introduce a convex relaxation and develop an alternating optimization algorithm to recover the corresponding data representations. Experimental evaluations on four benchmark datasets demonstrate that the proposed approach achieves better performance over several state-of-the-arts.
Nonconvex Sparse Spectral Clustering by Alternating Direction Method of Multipliers and Its Convergence Analysis
Lu, Canyi (National University of Singapore) | Feng, Jiashi (National University of Singapore) | Lin, Zhouchen (Peking University) | Yan, Shuicheng (National University of Singapore)
Spectral Clustering (SC) is a widely used data clustering method which first learns a low-dimensional embedding U of data by computing the eigenvectors of the normalized Laplacian matrix, and then performs k-means on U T to get the final clustering result. The Sparse Spectral Clustering (SSC) method extends SC with a sparse regularization on UU T by using the block diagonal structure prior of UU T in the ideal case. However, encouraging UU T to be sparse leads to a heavily nonconvex problem which is challenging to solve and the work (Lu, Yan, and Lin 2016) proposes a convex relaxation in the pursuit of this aim indirectly. However, the convex relaxation generally leads to a loose approximation and the quality of the solution is not clear. This work instead considers to solve the nonconvex formulation of SSC which directly encourages UU T to be sparse. We propose an efficient Alternating Direction Method of Multipliers (ADMM) to solve the nonconvex SSC and provide the convergence guarantee. In particular, we prove that the sequences generated by ADMM always exist a limit point and any limit point is a stationary point. Our analysis does not impose any assumptions on the iterates and thus is practical. Our proposed ADMM for nonconvex problems allows the stepsize to be increasing but upper bounded, and this makes it very efficient in practice. Experimental analysis on several real data sets verifies the effectiveness of our method.
CoDiNMF: Co-Clustering of Directed Graphs via NMF
Lim, Woosang (Georgia Institute of Technology) | Du, Rundong (Georgia Institute of Technology) | Park, Haesun (Georgia Institute of Technology)
Co-clustering computes clusters of data items and the related features concurrently, and it has been used in many applications such as community detection, product recommendation, computer vision, and pricing optimization. In this paper, we propose a new co-clustering method, called CoDiNMF, which improves the clustering quality and finds directional patterns among co-clusters by using multiple directed and undirected graphs. We design the objective function of co-clustering by using min-cut criterion combined with an additional term which controls the sum of net directional flow between different co-clusters. In addition, we show that a variant of Nonnegative Matrix Factorization (NMF) can solve the proposed objective function effectively. We run experiments on the US patents and BlogCatalog data sets whose ground truth have been known, and show that CoDiNMF improves clustering results compared to other co-clustering methods in terms of average F1 score, Rand index, and adjusted Rand index (ARI). Finally, we compare CoDiNMF and other co-clustering methods on the Wikipedia data set of philosophers, and we can find meaningful directional flow of influence among co-clusters of philosophers.
Balanced Clustering via Exclusive Lasso: A Pragmatic Approach
Li, Zhihui (Beijing Etrol Technologies Co., Ltd.) | Nie, Feiping (Centre for OPTical Imagery Analysis and Learning, Northwestern Polytechnical University) | Chang, Xiaojun (School of Computer Science, Carnegie Mellon University) | Ma, Zhigang (School of Computer Science, Carnegie Mellon University) | Yang, Yi (Centre for Artificial Intelligence, University of Technology Sydney)
Clustering is an effective technique in data mining to generate groups that are the matter of interest.Among various clustering approaches, the family of k-means algorithms and min-cut algorithms gain most popularity due to their simplicity and efficacy. The classical k-means algorithm partitions a number of data points into several subsets by iteratively updating the clustering centers and the associated data points. By contrast, a weighted undirected graph is constructed in min-cut algorithms which partition the vertices of the graph into two sets. However, existing clustering algorithms tend to cluster minority of data points into a subset, which shall be avoided when the target dataset is balanced. To achieve more accurate clustering for balanced dataset, we propose to leverage exclusive lasso on k-means and min-cut to regulate the balance degree of the clustering results. By optimizing our objective functions that build atop the exclusive lasso, we can make the clustering result as much balanced as possible. Extensive experiments on several large-scale datasets validate the advantage of the proposed algorithms compared to the state-of-the-art clustering algorithms.
Online Clustering of Contextual Cascading Bandits
Li, Shuai (The Chinese University of Hong Kong) | Zhang, Shengyu (The Chinese University of Hong Kong)
We consider a new setting of online clustering of contextual cascading bandits, an online learning problem where the underlying cluster structure over users is unknown and needs to be learned from a random prefix feedback. More precisely, a learning agent recommends an ordered list of items to a user, who checks the list and stops at the first satisfactory item, if any. We propose an algorithm of CLUB-cascade for this setting and prove an n-step regret bound of order O(√n). Previous work corresponds to the degenerate case of only one cluster, and our general regret bound in this special case also significantly improves theirs. We conduct experiments on both synthetic and real data, and demonstrate the effectiveness of our algorithm and the advantage of incorporating online clustering method.
Unified Spectral Clustering With Optimal Graph
Kang, Zhao (University of Electronic Science and Technology of China) | Peng, Chong (Southern Illinois University) | Cheng, Qiang (University of Kentucky) | Xu, Zenglin (University of Electronic Science and Technology of China)
Spectral clustering has found extensive use in many areas. Most traditional spectral clustering algorithms work in three separate steps: similarity graph construction; continuous labels learning; discretizing the learned labels by k-means clustering. Such common practice has two potential flaws, which may lead to severe information loss and performance degradation. First, predefined similarity graph might not be optimal for subsequent clustering. It is well-accepted that similarity graph highly affects the clustering results. To this end, we propose to automatically learn similarity information from data and simultaneously consider the constraint that the similarity matrix has exact c connected components if there are c clusters. Second, the discrete solution may deviate from the spectral solution since k-means method is well-known as sensitive to the initialization of cluster centers. In this work, we transform the candidate solution into a new one that better approximates the discrete one. Finally, those three subtasks are integrated into a unified framework, with each subtask iteratively boosted by using the results of the others towards an overall optimal solution. It is known that the performance of a kernel method is largely determined by the choice of kernels. To tackle this practical problem of how to select the most suitable kernel for a particular data set, we further extend our model to incorporate multiple kernel learning ability. Extensive experiments demonstrate the superiority of our proposed method as compared to existing clustering approaches.
Metric-Based Auto-Instructor for Learning Mixed Data Representation
Jian, Songlei (National University of Defense Technology, University of Technology Sydney) | Hu, Liang (University of Technology Sydney) | Cao, Longbing (University of Technology Sydney) | Lu, Kai (National University of Defense Technology)
Mixed data with both categorical and continuous features are ubiquitous in real-world applications. Learning a good representation of mixed data is critical yet challenging for further learning tasks. Existing methods for representing mixed data often overlook the heterogeneous coupling relationships between categorical and continuous features as well as the discrimination between objects. To address these issues, we propose an auto-instructive representation learning scheme to enable margin-enhanced distance metric learning for a discrimination-enhanced representation. Accordingly, we design a metric-based auto-instructor (MAI) model which consists of two collaborative instructors. Each instructor captures the feature-level couplings in mixed data with fully connected networks, and guides the infinite-margin metric learning for the peer instructor with a contrastive order. By feeding the learned representation into both partition-based and density-based clustering methods, our experiments on eight UCI datasets show highly significant learning performance improvement and much more distinguishable visualization outcomes over the baseline methods.
Lagrangian Constrained Community Detection
Ganji, Mohadeseh (The University of Melbourne) | Bailey, James (The University of Melbourne) | Stuckey, Peter J. (The University of Melbourne)
Semi-supervised or constrained community detection incorporates side information to findcommunities of interest in complex networks. The supervision is often represented as constraints such as known labels and pairwise constraints. Existing constrained community detection approaches often fail to fully benefit from the available side information. This results in poor performance for scenarios such as: when the constraints are required to be fully satisfied, when there is a high confidence about the correctness of the supervision information, and in situations where the side information is expensive or hard to achieve and is only available in a limited amount. In this paper, we propose a new constrained community detection algorithm based on Lagrangian multipliers to incorporate and fully satisfy the instance level supervisio nconstraints. Our proposed algorithm can more fully utilise available side information and find better quality solutions. Our experiments on real and synthetic data sets show our proposed LagCCD algorithm outperforms existing algorithms in terms of solution quality, ability to satisfy the constraints and noise resistance.
Clustering Small Samples With Quality Guarantees: Adaptivity With One2all PPS
Cohen, Edith (Google Research, Tel Aviv University) | Chechik, Shiri (Tel Aviv University) | Kaplan, Haim (Tel Aviv University)
Clustering of data points is a fundamental tool in data analysis. We consider points X in a relaxed metric space, where the triangle inequality holds within a constant factor. A clustering of X is a partition of X defined by a set of points Q (centroids), according to the closest centroid. The cost of clustering X by Q is V ( Q )= ∑ x ∈ X d xQ . This formulation generalizes classic k- means clustering, which uses squared distances. Two basic tasks, parametrized by k ≥ 1, are cost estimation, which returns (approximate) V ( Q ) for queries Q such that | Q | = k and clustering, which returns an (approximate) minimizer of V ( Q ) of size | Q |= k . When the data set X is very large, we seek efficient constructions of small samples that can act as surrogates for performing these tasks. Existing constructions that provide quality guarantees, however, are either worst-case, and unable to benefit from structure of real data sets, or make explicit strong assumptions on the structure. We show here how to avoid both these pitfalls using adaptive designs. The core of our design are the novel one2all probabilities, computed for a set M of centroids and α ≥ 1: The clustering cost of each Q with cost V ( Q ) ≥ V(M)/α can be estimated well from a sample of size O (α | M | ε -2 ). For cost estimation, we apply one2all with a bicriteria approximate M , while adaptively balancing | M | and α to optimize sample size per quality. For clustering, we present a wrapper that adaptively applies a base clustering algorithm to a sample S, using the smallest sample that provides the desired statistical guarantees on quality. We demonstrate experimentally the huge gains of using our adaptive instead of worst-case methods.