AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Approximate Correspondences in High Dimensions

Grauman, Kristen, Darrell, Trevor

Neural Information Processing SystemsDec-31-2007

Pyramid intersection is an efficient method for computing an approximate partial matching between two sets of feature vectors. We introduce a novel pyramid embedding basedon a hierarchy of non-uniformly shaped bins that takes advantage of the underlying structure of the feature space and remains accurate even for sets with high-dimensional feature vectors. The matching similarity is computed in linear time and forms a Mercer kernel. Whereas previous matching approximation algorithmssuffer from distortion factors that increase linearly with the feature dimension, we demonstrate that our approach can maintain constant accuracy even as the feature dimension increases. When used as a kernel in a discriminative classifier,our approach achieves improved object recognition results over a state-of-the-art set kernel.

artificial intelligence, machine learning, pyramid, (18 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Clustering Under Prior Knowledge with Application to Image Segmentation

Cheng, Dong S., Murino, Vittorio, Figueiredo, Mário

Neural Information Processing SystemsDec-31-2007

This paper proposes a new approach to model-based clustering under prior knowledge.

artificial intelligence, machine learning, segmentation, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Technology:

Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

PG-means: learning the number of clusters in data

Feng, Yu, Hamerly, Greg

Neural Information Processing SystemsDec-31-2007

We present a novel algorithm called PG-means which is able to learn the number of clusters in a classical Gaussian mixture model. Our method is robust and efficient; ituses statistical hypothesis tests on one-dimensional projections of the data and model to determine if the examples are well represented by the model. In so doing, we are applying a statistical test for the entire model at once, not just on a per-cluster basis. We show that our method works well in difficult cases such as non-Gaussian data, overlapping clusters, eccentric clusters, high dimension, and many true clusters. Further, our new method provides a much more stable estimate of the number of clusters than existing methods.

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Automatic Pattern Classification by Unsupervised Learning Using Dimensionality Reduction of Data with Mirroring Neural Networks

Deepthi, Dasika Ratna, Krishna, G. R. Aditya, Eswaran, K.

arXiv.org Artificial IntelligenceDec-6-2007

This paper proposes an unsupervised learning technique by using Multi-layer Mirroring Neural Network and Forgy's clustering algorithm. Multi-layer Mirroring Neural Network is a neural network that can be trained with generalized data inputs (different categories of image patterns) to perform non-linear dimensionality reduction and the resultant low-dimensional code is used for unsupervised pattern classification using Forgy's algorithm. By adapting the non-linear activation function (modified sigmoidal function) and initializing the weights and bias terms to small random values, mirroring of the input pattern is initiated. In training, the weights and bias terms are changed in such a way that the input presented is reproduced at the output by back propagating the error. The mirroring neural network is capable of reducing the input vector to a great degree (approximately 1/30th the original size) and also able to reconstruct the input pattern at the output layer from this reduced code units. The feature set (output of central hidden layer) extracted from this network is fed to Forgy's algorithm, which classify input data patterns into distinguishable classes. In the implementation of Forgy's algorithm, initial seed points are selected in such a way that they are distant enough to be perfectly grouped into different categories. Thus a new method of unsupervised learning is formulated and demonstrated in this paper. This method gave impressive results when applied to classification of different image patterns.

artificial intelligence, machine learning, seed point, (16 more...)

arXiv.org Artificial Intelligence

0712.0938

Country:

North America > United States > New York (0.04)
North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(5 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.64)

Add feedback

Analyzing covert social network foundation behind terrorism disaster

Maeno, Yoshiharu, Ohsawa, Yukio

arXiv.org Artificial IntelligenceOct-23-2007

This paper addresses a method to analyze the covert social network foundation hidden behind the terrorism disaster. It is to solve a node discovery problem, which means to discover a node, which functions relevantly in a social network, but escaped from monitoring on the presence and mutual relationship of nodes. The method aims at integrating the expert investigator's prior understanding, insight on the terrorists' social network nature derived from the complex graph theory, and computational data processing. The social network responsible for the 9/11 attack in 2001 is used to execute simulation experiment to evaluate the performance of the method.

conspirator, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1504/IJSSCI.2009.024936

0710.4231

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.16)
Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry:

Law Enforcement & Public Safety > Terrorism (1.00)
Information Technology > Services (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Semantic distillation: a method for clustering objects by their contextual specificity

Sierocinski, Thomas, Béchec, Anthony Le, Théret, Nathalie, Petritis, Dimitri

arXiv.org Machine LearningOct-6-2007

Techniques for data-mining, latent semantic analysis, contextual search of databases, etc. have long ago been developed by computer scientists working on information retrieval (IR). Experimental scientists, from all disciplines, having to analyse large collections of raw experimental data (astronomical, physical, biological, etc.) have developed powerful methods for their statistical analysis and for clustering, categorising, and classifying objects. Finally, physicists have developed a theory of quantum measurement, unifying the logical, algebraic, and probabilistic aspects of queries into a single formalism. The purpose of this paper is twofold: first to show that when formulated at an abstract level, problems from IR, from statistical data analysis, and from physical measurement theories are very similar and hence can profitably be cross-fertilised, and, secondly, to propose a novel method of fuzzy hierarchical clustering, termed \textit{semantic distillation} -- strongly inspired from the theory of quantum measurement --, we developed to analyse raw data coming from various types of experiments on DNA arrays. We illustrate the method by analysing DNA arrays experiments and clustering the genes of the array according to their specificity.

graph, information, representation, (15 more...)

arXiv.org Machine Learning

0710.1203

Country:

North America > United States > New York (0.05)
Europe > Netherlands > South Holland > Dordrecht (0.05)
Oceania > New Zealand > South Island > Otago > Dunedin (0.04)
(5 more...)

Genre: Research Report (0.70)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.48)

Add feedback

Mixed membership stochastic blockmodels

Airoldi, Edoardo M, Blei, David M, Fienberg, Stephen E, Xing, Eric P

arXiv.org Machine LearningMay-30-2007

Observations consisting of measurements on relationships for pairs of objects arise in many settings, such as protein interaction and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with probabilisic models can be delicate because the simple exchangeability assumptions underlying many boilerplate models no longer hold. In this paper, we describe a latent variable model of such data called the mixed membership stochastic blockmodel. This model extends blockmodels for relational data to ones which capture mixed membership latent relational structure, thus providing an object-specific low-dimensional representation. We develop a general variational inference algorithm for fast approximate posterior inference. We explore applications to social and protein interaction networks.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

0705.4485

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Truecluster matching

Oehlschlägel, Jens

arXiv.org Artificial IntelligenceMay-29-2007

Cluster matching by permuting cluster labels is important in many clustering contexts such as cluster validation and cluster ensemble techniques. The classic approach is to minimize the euclidean distance between two cluster solutions which induces inappropriate stability in certain settings. Therefore, we present the truematch algorithm that introduces two improvements best explained in the crisp case. First, instead of maximizing the trace of the cluster crosstable, we propose to maximize a chi-square transformation of this crosstable. Thus, the trace will not be dominated by the cells with the largest counts but by the cells with the most non-random observations, taking into account the marginals. Second, we suggest a probabilistic component in order to break ties and to make the matching algorithm truly random on random data. The truematch algorithm is designed as a building block of the truecluster framework and scales in polynomial time. First simulation results confirm that the truematch algorithm gives more consistent truecluster results for unequal cluster sizes. Free R software is available.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

0705.4302

Country:

Europe > Austria > Vienna (0.05)
North America > United States > New York (0.04)
Europe > Hungary > Budapest > Budapest (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Truecluster: robust scalable clustering with model selection

Oehlschlägel, Jens

arXiv.org Artificial IntelligenceMay-28-2007

Data-based classification is fundamental to most branches of science. While recent years have brought enormous progress in various areas of statistical computing and clustering, some general challenges in clustering remain: model selection, robustness, and scalability to large datasets. We consider the important problem of deciding on the optimal number of clusters, given an arbitrary definition of space and clusteriness. We show how to construct a cluster information criterion that allows objective model selection. Differing from other approaches, our truecluster method does not require specific assumptions about underlying distributions, dissimilarity definitions or cluster models. Truecluster puts arbitrary clustering algorithms into a generic unified (sampling-based) statistical framework. It is scalable to big datasets and provides robust cluster assignments and case-wise diagnostics. Truecluster will make clustering more objective, allows for automation, and will save time and costs. Free R software is available.

artificial intelligence, base cluster algorithm, machine learning, (15 more...)

arXiv.org Artificial Intelligence

cs/0601001

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > New York (0.05)
(5 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Closed-Loop Learning of Visual Control Policies

Jodogne, S. R., Piater, J. H.

Journal of Artificial Intelligence ResearchMar-26-2007

In this paper we present a general, flexible framework for learning mappings from images to actions by interacting with the environment. The basic idea is to introduce a feature-based image classifier in front of a reinforcement learning algorithm. The classifier partitions the visual space according to the presence or absence of few highly informative local descriptors that are incrementally selected in a sequence of attempts to remove perceptual aliasing. We also address the problem of fighting overfitting in such a greedy algorithm. Finally, we show how high-level visual features can be generated when the power of local descriptors is insufficient for completely disambiguating the aliased states. This is done by building a hierarchy of composite features that consist of recursive spatial combinations of visual features. We demonstrate the efficacy of our algorithms by solving three visual navigation tasks and a visual version of the classical ``Car on the Hill'' control problem.

algorithm, learning, visual feature, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.2110

AI Access Foundation

10491

Journal of Artificial Intelligence Research

Country: