Goto

Collaborating Authors

 Clustering


Robust Multiple Manifolds Structure Learning

arXiv.org Machine Learning

We present a robust multiple manifolds structure learning (RMMSL) scheme to robustly estimate data structures under the multiple low intrinsic dimensional manifolds assumption. In the local learning stage, RMMSL efficiently estimates local tangent space by weighted low-rank matrix factorization. In the global learning stage, we propose a robust manifold clustering method based on local structure learning results. The proposed clustering method is designed to get the flattest manifolds clusters by introducing a novel curved-level similarity function. Our approach is evaluated and compared to state-of-the-art methods on synthetic data, handwritten digit images, human motion capture data and motorbike videos. We demonstrate the effectiveness of the proposed approach, which yields higher clustering accuracy, and produces promising results for challenging tasks of human motion segmentation and motion flow learning from videos.


Convex Multitask Learning with Flexible Task Clusters

arXiv.org Machine Learning

Traditionally, multitask learning (MTL) assumes that all the tasks are related. This can lead to negative transfer when tasks are indeed incoherent. Recently, a number of approaches have been proposed that alleviate this problem by discovering the underlying task clusters or relationships. However, they are limited to modeling these relationships at the task level, which may be restrictive in some applications. In this paper, we propose a novel MTL formulation that captures task relationships at the feature-level. Depending on the interactions among tasks and features, the proposed method construct different task clusters for different features, without even the need of pre-specifying the number of clusters. Computationally, the proposed formulation is strongly convex, and can be efficiently solved by accelerated proximal methods. Experiments are performed on a number of synthetic and real-world data sets. Under various degrees of task relationships, the accuracy of the proposed method is consistently among the best. Moreover, the feature-specific task clusters obtained agree with the known/plausible task structures of the data.


Estimation and Clustering with Infinite Rankings

arXiv.org Machine Learning

This paper presents a natural extension of stagewise ranking to the the case of infinitely many items. We introduce the infinite generalized Mallows model (IGM), describe its properties and give procedures to estimate it from data. For estimation of multimodal distributions we introduce the Exponential-Blurring-Mean-Shift nonparametric clustering algorithm. The experiments highlight the properties of the new model and demonstrate that infinite models can be simple, elegant and practical.


Flexible Priors for Exemplar-based Clustering

arXiv.org Machine Learning

Exemplar-based clustering methods have been shown to produce state-of-the-art results on a number of synthetic and real-world clustering problems. They are appealing because they offer computational benefits over latent-mean models and can handle arbitrary pairwise similarity measures between data points. However, when trying to recover underlying structure in clustering problems, tailored similarity measures are often not enough; we also desire control over the distribution of cluster sizes. Priors such as Dirichlet process priors allow the number of clusters to be unspecified while expressing priors over data partitions. To our knowledge, they have not been applied to exemplar-based models. We show how to incorporate priors, including Dirichlet process priors, into the recently introduced affinity propagation algorithm. We develop an efficient maxproduct belief propagation algorithm for our new model and demonstrate experimentally how the expanded range of clustering priors allows us to better recover true clusterings in situations where we have some information about the generating process.


Topological graph clustering with thin position

arXiv.org Machine Learning

A clustering algorithm partitions a set of data points into smaller sets (clusters) such that each subset is more tightly packed than the whole. Many approaches to clustering translate the vector data into a graph with edges reflecting a distance or similarity metric on the points, then look for highly connected subgraphs. We introduce such an algorithm based on ideas borrowed from the topological notion of thin position for knots and 3-dimensional manifolds.


Classifying Scientific Performance on a Metric-by-Metric Basis

AAAI Conferences

In this paper, we outline a system for evaluating the performance of scientific research across a number of outcome metrics (e.g. publications, sales, new hires). Our system is designed to classify research performance into a number of metrics, evaluate each metricโ€™s performance using only data on other metrics, and to cast predictions of future performance by metric. This study shows how data mining techniques can be used to provide a predictive analytic approach to the management of resources for scientific research.


Proper Noun Semantic Clustering Using Bag-of-Vectors

AAAI Conferences

In this paper, we propose a model for semantic clustering of entities extracted from a text, and we apply it to a Proper Noun classification task.This model is based on a new method to compute the similarity between the entities.Indeed, the classical way of calculating similarity is to build a feature vector or Bag-of-Features for each entity and then use classical similarity functions like Cosine.In practice, the features are contextual, such as words around the different occurrences of each entity. Here, we propose to use an alternative representation for entities, called Bag-of-Vectors, or Bag-of-Bags-of-Features.In this new model, each entity is not defined as a unique vector but as a set of vectors, in which each vector is built based on the contextual features of one occurrence of the entity.In order to use Bag-of-Vectors for clustering, we introduce new versions of classical similarity functions such as Cosine and Scalar Products. Experimentally, we show that the Bag-of-Vectors representation always improve the clustering results compared to classical Bag-of-Features representations.


Dynamic Behavioral Mixed-Membership Model for Large Evolving Networks

arXiv.org Machine Learning

The majority of real-world networks are dynamic and extremely large (e.g., Internet Traffic, Twitter, Facebook,...). To understand the structural behavior of nodes in these large dynamic networks, it may be necessary to model the dynamics of behavioral roles representing the main connectivity patterns over time. In this paper, we propose a dynamic behavioral mixed-membership model (DBMM) that captures the "roles" of nodes in the graph and how they evolve over time. Unlike other node-centric models, our model is scalable for analyzing large dynamic networks. In addition, DBMM is flexible, parameter-free, has no functional form or parameterization, and is interpretable (identifies explainable patterns). The performance results indicate our approach can be applied to very large networks while the experimental results show that our model uncovers interesting patterns underlying the dynamics of these networks.


Graph-based Learning with Unbalanced Clusters

arXiv.org Machine Learning

Graph construction is a crucial step in spectral clustering (SC) and graph-based semi-supervised learning (SSL). Spectral methods applied on standard graphs such as full-RBF, $\epsilon$-graphs and $k$-NN graphs can lead to poor performance in the presence of proximal and unbalanced data. This is because spectral methods based on minimizing RatioCut or normalized cut on these graphs tend to put more importance on balancing cluster sizes over reducing cut values. We propose a novel graph construction technique and show that the RatioCut solution on this new graph is able to handle proximal and unbalanced data. Our method is based on adaptively modulating the neighborhood degrees in a $k$-NN graph, which tends to sparsify neighborhoods in low density regions. Our method adapts to data with varying levels of unbalancedness and can be naturally used for small cluster detection. We justify our ideas through limit cut analysis. Unsupervised and semi-supervised experiments on synthetic and real data sets demonstrate the superiority of our method.


Dissimilarity Clustering by Hierarchical Multi-Level Refinement

arXiv.org Machine Learning

We introduce in this paper a new way of optimizing the natural extension of the quantization error using in k-means clustering to dissimilarity data. The proposed method is based on hierarchical clustering analysis combined with multilevel heuristic refinement. The method is computationally efficient and achieves better quantization errors than the relational k-means.