AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Robust Multiple Manifolds Structure Learning

Gong, Dian, Zhao, Xuemei, Medioni, Gerard

arXiv.org Machine LearningJun-18-2012

We present a robust multiple manifolds structure learning (RMMSL) scheme to robustly estimate data structures under the multiple low intrinsic dimensional manifolds assumption. In the local learning stage, RMMSL efficiently estimates local tangent space by weighted low-rank matrix factorization. In the global learning stage, we propose a robust manifold clustering method based on local structure learning results. The proposed clustering method is designed to get the flattest manifolds clusters by introducing a novel curved-level similarity function. Our approach is evaluated and compared to state-of-the-art methods on synthetic data, handwritten digit images, human motion capture data and motorbike videos. We demonstrate the effectiveness of the proposed approach, which yields higher clustering accuracy, and produces promising results for challenging tasks of human motion segmentation and motion flow learning from videos.

artificial intelligence, machine learning, manifold, (13 more...)

arXiv.org Machine Learning

1206.4624

Country: North America > United States > California (0.46)

Genre: Research Report (0.84)

Industry:

Education (0.68)
Government > Regional Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.88)

Add feedback

Convex Multitask Learning with Flexible Task Clusters

Zhong, Wenliang, Kwok, James

arXiv.org Machine LearningJun-18-2012

Traditionally, multitask learning (MTL) assumes that all the tasks are related. This can lead to negative transfer when tasks are indeed incoherent. Recently, a number of approaches have been proposed that alleviate this problem by discovering the underlying task clusters or relationships. However, they are limited to modeling these relationships at the task level, which may be restrictive in some applications. In this paper, we propose a novel MTL formulation that captures task relationships at the feature-level. Depending on the interactions among tasks and features, the proposed method construct different task clusters for different features, without even the need of pre-specifying the number of clusters. Computationally, the proposed formulation is strongly convex, and can be efficiently solved by accelerated proximal methods. Experiments are performed on a number of synthetic and real-world data sets. Under various degrees of task relationships, the accuracy of the proposed method is consistently among the best. Moreover, the feature-specific task clusters obtained agree with the known/plausible task structures of the data.

artificial intelligence, learning, machine learning, (13 more...)

arXiv.org Machine Learning

1206.4601

Country:

Europe (1.00)
North America > United States (0.94)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Estimation and Clustering with Infinite Rankings

Meila, Marina, Bao, Le

arXiv.org Machine LearningJun-13-2012

This paper presents a natural extension of stagewise ranking to the the case of infinitely many items. We introduce the infinite generalized Mallows model (IGM), describe its properties and give procedures to estimate it from data. For estimation of multimodal distributions we introduce the Exponential-Blurring-Mean-Shift nonparametric clustering algorithm. The experiments highlight the properties of the new model and demonstrate that infinite models can be simple, elegant and practical.

algorithm, permutation, sufficient statistics, (15 more...)

arXiv.org Machine Learning

1206.327

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Flexible Priors for Exemplar-based Clustering

Tarlow, Daniel, Zemel, Richard S., Frey, Brendan J.

arXiv.org Machine LearningJun-13-2012

Exemplar-based clustering methods have been shown to produce state-of-the-art results on a number of synthetic and real-world clustering problems. They are appealing because they offer computational benefits over latent-mean models and can handle arbitrary pairwise similarity measures between data points. However, when trying to recover underlying structure in clustering problems, tailored similarity measures are often not enough; we also desire control over the distribution of cluster sizes. Priors such as Dirichlet process priors allow the number of clusters to be unspecified while expressing priors over data partitions. To our knowledge, they have not been applied to exemplar-based models. We show how to incorporate priors, including Dirichlet process priors, into the recently introduced affinity propagation algorithm. We develop an efficient maxproduct belief propagation algorithm for our new model and demonstrate experimentally how the expanded range of clustering priors allows us to better recover true clusterings in situations where we have some information about the generating process.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1206.3294

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Topological graph clustering with thin position

Johnson, Jesse

arXiv.org Machine LearningJun-4-2012

A clustering algorithm partitions a set of data points into smaller sets (clusters) such that each subset is more tightly packed than the whole. Many approaches to clustering translate the vector data into a graph with edges reflecting a distance or similarity metric on the points, then look for highly connected subgraphs. We introduce such an algorithm based on ideas borrowed from the topological notion of thin position for knots and 3-dimensional manifolds.

artificial intelligence, machine learning, pinch cluster, (15 more...)

arXiv.org Machine Learning

1206.0771

Country: North America > United States > Oklahoma (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Classifying Scientific Performance on a Metric-by-Metric Basis

Bell, Eric Belanga (Pacific Northwest National Laboratory) | Marshall, Eric (Pacific Northwest National Laboratory) | Hull, Ryan (Pacific Northwest National Laboratory) | Fligg, Keith (Pacific Northwest National Laboratory) | Sanfilippo, Antonio (Pacific Northwest National Laboratory) | Daly, Don (Pacific Northwest National Laboratory) | Engel, Dave (Pacific Northwest National Laboratory)

AAAI ConferencesMay-20-2012

In this paper, we outline a system for evaluating the performance of scientific research across a number of outcome metrics (e.g. publications, sales, new hires). Our system is designed to classify research performance into a number of metrics, evaluate each metric’s performance using only data on other metrics, and to cast predictions of future performance by metric. This study shows how data mining techniques can be used to provide a predictive analytic approach to the management of resources for scientific research.

metric, performance class, research project, (15 more...)

AAAI Conferences

Twenty-Fifth International FLAIRS Conference

Country:

North America > United States > Washington > Benton County > Richland (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Proper Noun Semantic Clustering Using Bag-of-Vectors

Ebadat, Ali Reza (INRIA-INSA) | Claveau, Vincent (IRISA-CNRS) | Sébillot, Pascale (IRISA-INS)

AAAI ConferencesMay-20-2012

In this paper, we propose a model for semantic clustering of entities extracted from a text, and we apply it to a Proper Noun classification task.This model is based on a new method to compute the similarity between the entities.Indeed, the classical way of calculating similarity is to build a feature vector or Bag-of-Features for each entity and then use classical similarity functions like Cosine.In practice, the features are contextual, such as words around the different occurrences of each entity. Here, we propose to use an alternative representation for entities, called Bag-of-Vectors, or Bag-of-Bags-of-Features.In this new model, each entity is not defined as a unique vector but as a set of vectors, in which each vector is built based on the contextual features of one occurrence of the entity.In order to use Bag-of-Vectors for clustering, we introduce new versions of classical similarity functions such as Cosine and Scalar Products. Experimentally, we show that the Bag-of-Vectors representation always improve the clustering results compared to classical Bag-of-Features representations.

computational linguistic, representation, similarity function, (12 more...)

AAAI Conferences

Twenty-Fifth International FLAIRS Conference

Country:

North America > United States (0.15)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France (0.04)
Europe > Czechia > Prague (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Dynamic Behavioral Mixed-Membership Model for Large Evolving Networks

Rossi, Ryan, Gallagher, Brian, Neville, Jennifer, Henderson, Keith

arXiv.org Machine LearningMay-9-2012

The majority of real-world networks are dynamic and extremely large (e.g., Internet Traffic, Twitter, Facebook,...). To understand the structural behavior of nodes in these large dynamic networks, it may be necessary to model the dynamics of behavioral roles representing the main connectivity patterns over time. In this paper, we propose a dynamic behavioral mixed-membership model (DBMM) that captures the "roles" of nodes in the graph and how they evolve over time. Unlike other node-centric models, our model is scalable for analyzing large dynamic networks. In addition, DBMM is flexible, parameter-free, has no functional form or parameterization, and is interpretable (identifies explainable patterns). The performance results indicate our approach can be applied to very large networks while the experimental results show that our model uncovers interesting patterns underlying the dynamics of these networks.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Machine Learning

1205.2056

Genre: Research Report (0.70)

Industry: Information Technology > Services (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Graph-based Learning with Unbalanced Clusters

Qian, Jing, Saligrama, Venkatesh, Zhao, Manqi

arXiv.org Machine LearningMay-8-2012

Graph construction is a crucial step in spectral clustering (SC) and graph-based semi-supervised learning (SSL). Spectral methods applied on standard graphs such as full-RBF, $\epsilon$-graphs and $k$-NN graphs can lead to poor performance in the presence of proximal and unbalanced data. This is because spectral methods based on minimizing RatioCut or normalized cut on these graphs tend to put more importance on balancing cluster sizes over reducing cut values. We propose a novel graph construction technique and show that the RatioCut solution on this new graph is able to handle proximal and unbalanced data. Our method is based on adaptively modulating the neighborhood degrees in a $k$-NN graph, which tends to sparsify neighborhoods in low density regions. Our method adapts to data with varying levels of unbalancedness and can be naturally used for small cluster detection. We justify our ideas through limit cut analysis. Unsupervised and semi-supervised experiments on synthetic and real data sets demonstrate the superiority of our method.

artificial intelligence, graph-based learning, machine learning, (17 more...)

arXiv.org Machine Learning

1205.1496

Country: North America > United States (0.28)

Genre:

Research Report (0.64)
Workflow (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Dissimilarity Clustering by Hierarchical Multi-Level Refinement

Conan-Guez, Brieuc, Rossi, Fabrice

arXiv.org Machine LearningApr-29-2012

We introduce in this paper a new way of optimizing the natural extension of the quantization error using in k-means clustering to dissimilarity data. The proposed method is based on hierarchical clustering analysis combined with multilevel heuristic refinement. The method is computationally efficient and achieves better quantization errors than the relational k-means.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1204.6509

Country:

Europe (0.47)
North America > United States > New York (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback