AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

A consistent adjacency spectral embedding for stochastic blockmodel graphs

Sussman, Daniel L., Tang, Minh, Fishkind, Donniell E., Priebe, Carey E.

arXiv.org Machine LearningApr-27-2012

We present a method to estimate block membership of nodes in a random graph generated by a stochastic blockmodel. We use an embedding procedure motivated by the random dot product graph model, a particular example of the latent position model. The embedding associates each node with a vector; these vectors are clustered via minimization of a square error criterion. We prove that this method is consistent for assigning nodes to blocks, as only a negligible number of nodes will be mis-assigned. We prove consistency of the method for directed and undirected graphs. The consistent block assignment makes possible consistent parameter estimation for a stochastic blockmodel. We extend the result in the setting where the number of blocks grows slowly with the number of nodes. Our method is also computationally feasible even for very large graphs. We compare our method to Laplacian spectral clustering through analysis of simulated data and a graph derived from Wikipedia documents.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Machine Learning

1108.2228

Country: North America > United States (0.29)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)

Add feedback

Hyperspectral Unmixing Overview: Geometrical, Statistical, and Sparse Regression-Based Approaches

Bioucas-Dias, José M., Plaza, Antonio, Dobigeon, Nicolas, Parente, Mario, Du, Qian, Gader, Paul, Chanussot, Jocelyn

arXiv.org Machine LearningApr-24-2012

Imaging spectrometers measure electromagnetic energy scattered in their instantaneous field view in hundreds or thousands of spectral channels with higher spectral resolution than multispectral cameras. Imaging spectrometers are therefore often referred to as hyperspectral cameras (HSCs). Higher spectral resolution enables material identification via spectroscopic analysis, which facilitates countless applications that require identifying materials in scenarios unsuitable for classical spectroscopic analysis. Due to low spatial resolution of HSCs, microscopic material mixing, and multiple scattering, spectra measured by HSCs are mixtures of spectra of materials in a scene. Thus, accurate estimation requires unmixing. Pixels are assumed to be mixtures of a few materials, called endmembers. Unmixing involves estimating all or some of: the number of endmembers, their spectral signatures, and their abundances at each pixel. Unmixing is a challenging, ill-posed inverse problem because of model inaccuracies, observation noise, environmental conditions, endmember variability, and data set size. Researchers have devised and investigated many models searching for robust, stable, tractable, and accurate unmixing algorithms. This paper presents an overview of unmixing methods from the time of Keshava and Mustard's unmixing tutorial [1] to the present. Mixing models are first discussed. Signal-subspace, geometrical, statistical, sparsity-based, and spatial-contextual unmixing algorithms are described. Mathematical problems and potential solutions are described. Algorithm characteristics are illustrated experimentally.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1202.6294

Country:

Europe (1.00)
North America > United States > Massachusetts (0.27)
North America > United States > Florida (0.27)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.34)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Energy (0.68)
Government > Military > Army (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(4 more...)

Add feedback

Efficient hierarchical clustering for continuous data

Henao, Ricardo, Lucas, Joseph E.

arXiv.org Machine LearningApr-20-2012

Learning hierarchical structures from observed data is a common practice in many knowledge domains. Examples include phylogenies and signaling pathways in biology, language models in linguistics, etc. Agglomerative clustering is still the most popular approach to hierarchical clustering due to its efficiency, ease of implementation and a wide range of possible distance metrics. However, because it is algorithmic in nature, there is no principled way to that agglomerative clustering can be used as a building block in more complex models. Bayesian priors for structure learning on the other hand, are perfectly suited to be employed in larger models. As an example, several authors have proposed using hierarchical structure priors to model correlation in factor models (Rai and Daume III, 2009; Henao et al., 2012; Zhang et al., 2011). Ricardo Henao is Postdoctoral Associate and Joseph E. Lucas is Assistant Research Professor at the Institute for Genome Sciences and Policy (IGSP), Duke University, Durham, NC 27710.

artificial intelligence, equation, machine learning, (19 more...)

arXiv.org Machine Learning

1204.4708

Country: North America > United States > North Carolina > Durham County > Durham (0.24)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Automatic Sampling of Geographic objects

Taillandier, Patrick, Gaffuri, Julien

arXiv.org Artificial IntelligenceApr-20-2012

Today, one's disposes of large datasets composed of thousands of geographic objects. However, for many processes, which require the appraisal of an expert or much computational time, only a small part of these objects can be taken into account. In this context, robust sampling methods become necessary. In this paper, we propose a sampling method based on clustering techniques. Our method consists in dividing the objects in clusters, then in selecting in each cluster, the most representative objects. A case-study in the context of a process dedicated to knowledge revision for geographic data generalisation is presented. This case-study shows that our method allows to select relevant samples of objects.

artificial intelligence, expert system, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1204.4541

Country: Asia > Vietnam (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.36)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.33)

Add feedback

Clustering using Max-norm Constrained Optimization

Jalali, Ali, Srebro, Nathan

arXiv.org Machine LearningApr-13-2012

We suggest using the max-norm as a convex surrogate constraint for clustering. We show how this yields a better exact cluster recovery guarantee than previously suggested nuclear-norm relaxation, and study the effectiveness of our method, and other related convex relaxations, compared to other clustering approaches.

artificial intelligence, machine learning, relaxation, (17 more...)

arXiv.org Machine Learning

1202.5598

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Transforming Graph Representations for Statistical Relational Learning

Rossi, Ryan A., McDowell, Luke K., Aha, David W., Neville, Jennifer

arXiv.org Artificial IntelligenceMar-30-2012

Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation--for the nodes, links, and features--can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed.

machine learning, natural language, node, (18 more...)

arXiv.org Artificial Intelligence

1204.0033

Country:

Europe (0.67)
North America > United States > Massachusetts (0.27)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.46)

Industry:

Information Technology > Services (1.00)
Health & Medicine (0.92)
Telecommunications (0.67)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Add feedback

A Semantic Metadirectory of Services Based on Web Mining Techniques

Fernández-Villamor, José Ignacio (Universidad Politecnica de Madrid) | Zemke, Tilo (Technische Universitaet Chemnitz) | Iglesias, Carlos Ángel (Universidad Politecnica de Madrid) | Garijo, Mercedes (Universidad Politecnica de Madrid)

AAAI ConferencesMar-25-2012

In the current web, developers are able to create new applications by composing already existing services from third-party vendors. However, the vast amount of choices, technologies and repositories can make it a tedious task. This paper describes a semantic metadirectory of services that helps in the process of discovering services. We propose a semantic service discovery process and description of existing service repositories, such as Programmable Web and Yahoo Pipes, which are two service repositories which provide plenty of services that can be reused by developers to build new web applications. The challenges behind integrating these repositories involved the problems of defining a common model, identifying relevant data and integrating and ranking the extracted data.

developer, metadirectory, repository, (13 more...)

AAAI Conferences

2012 AAAI Spring Symposium Series

Country:

Europe > Spain > Galicia > Madrid (0.05)
Europe > Germany (0.04)

Genre: Research Report (0.46)

Industry: Materials > Metals & Mining (0.40)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Bayesian Rose Trees

Blundell, Charles, Teh, Yee Whye, Heller, Katherine A.

arXiv.org Machine LearningMar-15-2012

Hierarchical structure is ubiquitous in data across many domains. There are many hierarchical clustering methods, frequently used by domain experts, which strive to discover this structure. However, most of these methods limit discoverable hierarchies to those with binary branching structure. This limitation, while computationally convenient, is often undesirable. In this paper we explore a Bayesian hierarchical clustering algorithm that can produce trees with arbitrary branching structure at each node, known as rose trees. We interpret these trees as mixtures over partitions of a data set, and use a computationally efficient, greedy agglomerative algorithm to find the rose trees which have high marginal likelihood given the data. Lastly, we perform experiments which demonstrate that rose trees are better models of data than the typical binary trees returned by other hierarchical clustering algorithms.

artificial intelligence, machine learning, partition, (19 more...)

arXiv.org Machine Learning

1203.3468

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

Subspace clustering of high-dimensional data: a predictive approach

McWilliams, Brian, Montana, Giovanni

arXiv.org Machine LearningMar-5-2012

In several application domains, high-dimensional observations are collected and then analysed in search for naturally occurring data clusters which might provide further insights about the nature of the problem. In this paper we describe a new approach for partitioning such high-dimensional data. Our assumption is that, within each cluster, the data can be approximated well by a linear subspace estimated by means of a principal component analysis (PCA). The proposed algorithm, Predictive Subspace Clustering (PSC) partitions the data into clusters while simultaneously estimating cluster-wise PCA parameters. The algorithm minimises an objective function that depends upon a new measure of influence for PCA models. A penalised version of the algorithm is also described for carrying our simultaneous subspace clustering and variable selection. The convergence of PSC is discussed in detail, and extensive simulation results and comparisons to competing methods are presented. The comparative performance of PSC has been assessed on six real gene expression data sets for which PSC often provides state-of-art results.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1203.1065

Country: North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)

Add feedback

Evolutionary Clustering and Analysis of User Behaviour in Online Forums

Morrison, Donn (Digital Enterprise Research Institute) | McLoughlin, Ian (Digital Enterprise Research Institute) | Hogan, Alice (Digital Enterprise Research Institute) | Hayes, Conor (Digital Enterprise Research Institute)

AAAI ConferencesFeb-22-2012

In this paper we cluster and analyse temporal user behaviour in online communities. We adapt a simple unsupervised clustering algorithm to an evolutionary setting where we cluster users into prototypical behavioural roles based on features derived from their ego-centric reply-graphs. We then analyse changes in the role membership of the users over time, the change in role composition of forums over time and examine the differences between forums in terms of role composition. We perform this analysis on 200 forums from a popular national bulletin board and 14 enterprise technical support forums.

artificial intelligence, data mining, machine learning, (15 more...)

AAAI Conferences

Sixth International AAAI Conference on Weblogs and Social Media

Country: Europe > Ireland > Connaught > County Galway > Galway (0.04)

Genre: Research Report (0.31)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.55)

Add feedback