AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Probabilistic Combination of Classifier and Cluster Ensembles for Non-transductive Learning

Acharya, Ayan, Hruschka, Eduardo R., Ghosh, Joydeep, Sarwar, Badrul, Ruvini, Jean-David

arXiv.org Machine LearningNov-10-2012

Unsupervised models can provide supplementary soft constraints to help classify new target data under the assumption that similar objects in the target set are more likely to share the same class label. Such models can also help detect possible differences between training and target distributions, which is useful in applications where concept drift may take place. This paper describes a Bayesian framework that takes as input class labels from existing classifiers (designed based on labeled data from the source domain), as well as cluster labels from a cluster ensemble operating solely on the target data to be classified, and yields a consensus labeling of the target data. This framework is particularly useful when the statistics of the target data drift or change from those of the training data. We also show that the proposed framework is privacy-aware and allows performing distributed learning when data/models have sharing restrictions. Experiments show that our framework can yield superior results to those provided by applying classifier ensembles only.

artificial intelligence, ensemble, machine learning, (17 more...)

arXiv.org Machine Learning

1211.2304

Country: North America > United States > Texas (0.28)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

Block Modeling in Large Social Networks with Many Clusters

Biesan, Shawn (Baldwin Wallace University) | Anthony, Adam (Baldwin Wallace University) | desJardins, Marie (University of Maryland Baltimore County)

AAAI ConferencesNov-5-2012

In this paper, we present an optimized version of the previously developed Block Modularity algorithm (Anthony,2009). The original algorithm was a fast, greedy method that effectively discovered a structured clustering in linked data and scaled very well with the number of nodes and edges. The optimized version is scalable in terms of the model complexity; the technique can now be used effectively to discover thousands of clusters in data sets with hundreds of thousands (and possibly more) nodes and edges. The optimization leads to an improvement of the runtime per iteration from cubic to quadratic with a small increase in the constant factor. The algorithm compares favorably with Karrer and Newman's Degree-Corrected Block Model (DCBM) in both runtime and quality of results.

algorithm, bm-opt, vertex, (16 more...)

AAAI Conferences

2012 AAAI Fall Symposium Series

Country:

North America > United States > Maryland > Baltimore (0.14)
North America > United States > Maryland > Baltimore County (0.04)
North America > United States > Ohio > Cuyahoga County > Berea (0.04)

Industry: Information Technology > Services (0.52)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.31)

Add feedback

Discovering Protein Clusters

Epstein, Susan (Hunter College and The Graduate Center of The City University of New York) | Li, Xingjian (Microsoft Online Services Division) | Valdez, Peter (Hunter College of The City University of New York) | Grayevsky, Sofia (Hunter College of The City University of New York) | Osisek, Eric (The Graduate Center of The City University of New York) | Yun, Xi (The Graduate Center of The City University of New York) | Xie, Lei (Hunter College of The City University of New York)

AAAI ConferencesNov-5-2012

As biological data about genes and their interactions proliferates, scientists have the opportunity to identify sets of proteins whose interactions make them worthy of further investigation. This paper reports on a knowledge discovery technique to support that work. Foretell is an algorithm originally designed to support search for solutions to constraint satisfaction problems. Recent adaptations enable Foretell to detect sets of genes that interact heavily with one another. We provide empirical results, and describe ongoing work on biological meaning and knowledge infusion from the user.

artificial intelligence, foretell, machine learning, (18 more...)

AAAI Conferences

2012 AAAI Fall Symposium Series

Country:

Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)
North America > United States > Washington > King County > Redmond (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Transforming Graph Data for Statistical Relational Learning

Rossi, R. A., McDowell, L. K., Aha, D. W., Neville, J.

Journal of Artificial Intelligence ResearchOct-30-2012

Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of Statistical Relational Learning (SRL) algorithms to these domains. In this article, we examine and categorize techniques for transforming graph-based relational data to improve SRL algorithms. In particular, appropriate transformations of the nodes, links, and/or features of the data can dramatically affect the capabilities and results of SRL algorithms. We introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. More specifically, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed.

knowledge discovery, representation, transforming graph data, (13 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3659

AI Access Foundation

10786

Journal of Artificial Intelligence Research

Country:

Asia > Middle East > Jordan (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(11 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.46)

Industry:

Information Technology > Services (1.00)
Government > Regional Government > North America Government > United States Government (0.92)
Health & Medicine (0.92)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Add feedback

A Biomimetic Approach Based on Immune Systems for Classification of Unstructured Data

Hamou, Mohamed, Amine, Abdelmalek, Lokbani, Ahmed Chaouki

arXiv.org Artificial IntelligenceOct-25-2012

In this paper we present the results of unstructured data clustering in this case a textual data from Reuters 21578 corpus with a new biomimetic approach using immune system. Before experimenting our immune system, we digitalized textual data by the n-grams approach. The novelty lies on hybridization of n-grams and immune systems for clustering. The experimental results show that the recommended ideas are promising and prove that this method can solve the text clustering problem.

classification, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1210.7002

Country:

Europe (0.46)
Africa > Middle East > Algeria (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(2 more...)

Add feedback

Learning Generative Models of Similarity Matrices

Rosales, Romer, Frey, Brendan J.

arXiv.org Machine LearningOct-19-2012

We describe a probabilistic (generative) view of affinity matrices along with inference algorithms for a subclass of problems associated with data clustering. This probabilistic view is helpful in understanding different models and algorithms that are based on affinity functions OF the data. IN particular, we show how(greedy) inference FOR a specific probabilistic model IS equivalent TO the spectral clustering algorithm.It also provides a framework FOR developing new algorithms AND extended models. AS one CASE, we present new generative data clustering models that allow us TO infer the underlying distance measure suitable for the clustering problem at hand. These models seem to perform well in a larger class of problems for which other clustering algorithms (including spectral clustering) usually fail. Experimental evaluation was performed in a variety point data sets, showing excellent performance.

data mining, machine learning, spectral, (20 more...)

arXiv.org Machine Learning

1212.2494

Country:

North America > United States > California (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Markov Random Walk Representations with Continuous Distributions

Yeang, Chen-Hsiang, Szummer, Martin

arXiv.org Machine LearningOct-19-2012

Representations based on random walks can exploit discrete data distributions for clustering and classification. We extend such representations from discrete to continuous distributions. Transition probabilities are now calculated using a diffusion equation with a diffusion coefficient that inversely depends on the data density. We relate this diffusion equation to a path integral and derive the corresponding path probability measure. The framework is useful for incorporating continuous data densities and prior knowledge.

artificial intelligence, data mining, machine learning, (14 more...)

arXiv.org Machine Learning

1212.251

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Fast Graph Construction Using Auction Algorithm

Wang, Jun, Xia, Yinglong

arXiv.org Machine LearningOct-16-2012

In practical machine learning systems, graph based data representation has been widely used in various learning paradigms, ranging from unsupervised clustering to supervised classification. Besides those applications with natural graph or network structure data, such as social network analysis and relational learning, many other applications often involve a critical step in converting data vectors to an adjacency graph. In particular, a sparse subgraph extracted from the original graph is often required due to both theoretic and practical needs. Previous study clearly shows that the performance of different learning algorithms, e.g., clustering and classification, benefits from such sparse subgraphs with balanced node connectivity. However, the existing graph construction methods are either computationally expensive or with unsatisfactory performance. In this paper, we utilize a scalable method called auction algorithm and its parallel extension to recover a sparse yet nearly balanced subgraph with significantly reduced computational cost. Empirical study and comparison with the stateof-art approaches clearly demonstrate the superiority of the proposed method in both efficiency and accuracy.

artificial intelligence, graph, machine learning, (14 more...)

arXiv.org Machine Learning

1210.4917

Country:

North America > United States (0.46)
North America > Canada (0.28)

Genre: Research Report (1.00)

Industry: Information Technology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Unsupervised Joint Alignment and Clustering using Bayesian Nonparametrics

Mattar, Marwan A., Hanson, Allen R., Learned-Miller, Erik G.

arXiv.org Machine LearningOct-16-2012

Joint alignment of a collection of functions is the process of independently transforming the functions so that they appear more similar to each other. Typically, such unsupervised alignment algorithms fail when presented with complex data sets arising from multiple modalities or make restrictive assumptions about the form of the functions or transformations, limiting their generality. We present a transformed Bayesian infinite mixture model that can simultaneously align and cluster a data set. Our model and associated learning scheme offer two key advantages: the optimal number of clusters is determined in a data-driven fashion through the use of a Dirichlet process prior, and it can accommodate any transformation function parameterized by a continuous parameter vector. As a result, it is applicable to a wide range of data types, and transformation functions. We present positive results on synthetic two-dimensional data, on a set of one-dimensional curves, and on various image data sets, showing large improvements over previous work. We discuss several variations of the model and conclude with directions for future work.

artificial intelligence, machine learning, transformation, (15 more...)

arXiv.org Machine Learning

1210.4892

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Add feedback

A Model-Based Approach to Rounding in Spectral Clustering

Poon, Leonard K. M., Liu, April H., Liu, Tengfei, Zhang, Nevin Lianwen

arXiv.org Machine LearningOct-16-2012

In spectral clustering, one defines a similarity matrix for a collection of data points, transforms the matrix to get the Laplacian matrix, finds the eigenvectors of the Laplacian matrix, and obtains a partition of the data using the leading eigenvectors. The last step is sometimes referred to as rounding, where one needs to decide how many leading eigenvectors to use, to determine the number of clusters, and to partition the data points. In this paper, we propose a novel method for rounding. The method differs from previous methods in three ways. First, we relax the assumption that the number of clusters equals the number of eigenvectors used. Second, when deciding the number of leading eigenvectors to use, we not only rely on information contained in the leading eigenvectors themselves, but also use subsequent eigenvectors. Third, our method is model-based and solves all the three subproblems of rounding using a class of graphical models called latent tree models. We evaluate our method on both synthetic and real-world data. The results show that our method works correctly in the ideal case where between-clusters similarity is 0, and degrades gracefully as one moves away from the ideal case.

artificial intelligence, eigenvector, machine learning, (17 more...)

arXiv.org Machine Learning

1210.4883

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback