AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Ambiguity-Driven Fuzzy C-Means Clustering: How to Detect Uncertain Clustered Records

Ghaffari, Meysam, Ghadiri, Nasser

arXiv.org Artificial IntelligenceSep-9-2014

As a well-known clustering algorithm, Fuzzy C-Means (FCM) allows each input sample to belong to more than one cluster, providing more flexibility than non-fuzzy clustering methods. However, the accuracy of FCM is subject to false detections caused by noisy records, weak feature selection and low certainty of the algorithm in some cases. The false detections are very important in some decision-making application domains like network security and medical diagnosis, where weak decisions based on such false detections may lead to catastrophic outcomes. They are mainly emerged from making decisions about a subset of records that do not provide enough evidence to make a good decision. In this paper, we propose a method for detecting such ambiguous records in FCM by introducing a certainty factor to decrease invalid detections. This approach enables us to send the detected ambiguous records to another discrimination method for a deeper investigation, thus increasing the accuracy by lowering the error rate. Most of the records are still processed quickly and with low error rate which prevents performance loss compared to similar hybrid methods. Experimental results of applying the proposed method on several datasets from different domains show a significant decrease in error rate as well as improved sensitivity of the algorithm.

ad-fcm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10489-016-0759-1

1409.2821

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Automatic Dimension Selection for a Non-negative Factorization Approach to Clustering Multiple Random Graphs

Lee, Nam H., Wang, I-Jeng, Park, Youngser, Priebe, Care E., Rosen, Michael

arXiv.org Machine LearningSep-9-2014

We consider a problem of grouping multiple graphs into several clusters using singular value thesholding and non-negative factorization. We derive a model selection information criterion to estimate the number of clusters. We demonstrate our approach using "Swimmer data set" as well as simulated data set, and compare its performance with two standard clustering algorithms.

artificial intelligence, machine learning, non-negative factorization approach, (2 more...)

arXiv.org Machine Learning

1406.6315

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.53)

Add feedback

Spectral Clustering of Graphs with the Bethe Hessian

Saade, Alaa, Krzakala, Florent, Zdeborová, Lenka

arXiv.org Machine LearningSep-8-2014

Recently, it has been argued that using instead a more complicated, non-symmetric and higher dimensional operator, related to the non-backtracking walk on the graph, leads to improved performance in detecting clusters, and even to optimal performance for the stochastic block model. Here, we propose to use instead a simpler object, a symmetric real matrix known as the Bethe Hessian operator, or deformed Laplacian. We show that this approach combines the performances of the non-backtracking operator, thus detecting clusters all the way down to the theoretical limit in the stochastic block model, with the computational, theoretical and memory advantages of real symmetric matrices. Clustering a graph into groups or functional modules (sometimes called communities) is a central task in many fields ranging from machine learning to biology. A common benchmark for this problem is to consider graphs generated by the stochastic block model (SBM) [7, 22]. In this case, one considersn vertices and each of them has a group label g v { 1,...,q} . A graph is then created as follows: all edges are generated independently according to aq q matrix p of probabilities, with Pr[A u,v 1] p g u,g v.

artificial intelligence, eigenvalue, machine learning, (15 more...)

arXiv.org Machine Learning

1406.188

Country:

Europe (0.46)
North America > United States (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Unsupervised deconvolution of dynamic imaging reveals intratumor vascular heterogeneity

Chen, Li, Choyke, Peter L., Wang, Niya, Clarke, Robert, Bhujwalla, Zaver M., Hillman, Elizabeth M. C., Wang, Yue

arXiv.org Machine LearningSep-4-2014

Department of Biomedical Engineering, Biomedical Imaging Center, Rensselaer Polytechnic Institute, Troy, NY 12180, USA 2 With the existence of biologically distinctive malignant cells originated within the same tumor, intratumor functional heterogeneity is present in many cancers and is often manifested by the intermingled vascular compartments with distinct pharmacokinetics. However, intratumor vascular heterogeneity cannot be resolved directly by most in vivo dynamic imaging. We developed multi-tissue compartment modeling (MTCM), a completely unsupervised method of deconvoluting dynamic imaging series from heterogeneous tumors that can improve vascular characterization in many biological contexts. Applying MTCM to dynamic contrast-enhanced magnetic resonance imaging of breast cancers revealed characteristic intratumor vascular heterogeneity and therapeutic responses that were otherwise undetectable. MTCM is readily applicable to other dynamic imaging modalities for studying intratumor functional and phenotypic heterogeneity, together with a variety of foreseeable applications in the clinic. A formal mathematical description of the method and its detailed implementation is available in Methods.

artificial intelligence, bioinformatics, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1371/journal.pone.0112143

1306.3392

Country: North America > United States > New York > Rensselaer County > Troy (0.24)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Biomedical Informatics (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Axiomatic Construction of Hierarchical Clustering in Asymmetric Networks

Carlsson, Gunnar, Mémoli, Facundo, Ribeiro, Alejandro, Segarra, Santiago

arXiv.org Machine LearningSep-2-2014

This paper considers networks where relationships between nodes are represented by directed dissimilarities. The goal is to study methods for the determination of hierarchical clusters, i.e., a family of nested partitions indexed by a connectivity parameter, induced by the given dissimilarity structures. Our construction of hierarchical clustering methods is based on defining admissible methods to be those methods that abide by the axioms of value - nodes in a network with two nodes are clustered together at the maximum of the two dissimilarities between them - and transformation - when dissimilarities are reduced, the network may become more clustered but not less. Several admissible methods are constructed and two particular methods, termed reciprocal and nonreciprocal clustering, are shown to provide upper and lower bounds in the space of admissible methods. Alternative clustering methodologies and axioms are further considered. Allowing the outcome of hierarchical clustering to be asymmetric, so that it matches the asymmetry of the original data, leads to the inception of quasi-clustering methods. The existence of a unique quasi-clustering method is shown. Allowing clustering in a two-node network to proceed at the minimum of the two dissimilarities generates an alternative axiomatic construction. There is a unique clustering method in this case too. The paper also develops algorithms for the computation of hierarchical clusters using matrix powers on a min-max dioid algebra and studies the stability of the methods proposed. We proved that most of the methods introduced in this paper are such that similar networks yield similar hierarchical clustering results. Algorithms are exemplified through their application to networks describing internal migration within states of the United States (U.S.) and the interrelation between sectors of the U.S. economy.

banking & finance, dissimilarity, us government, (18 more...)

arXiv.org Machine Learning

1301.7724

Country:

Europe (0.45)
North America > United States > Montana (0.27)
North America > United States > Wyoming (0.27)
(4 more...)

Genre: Research Report (0.49)

Industry:

Transportation (1.00)
Materials (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

An application of topological graph clustering to protein function prediction

Bowman, R. Sean, Heisterkamp, Douglas, Johnson, Jesse, O'Donnol, Danielle

arXiv.org Machine LearningAug-24-2014

We use a semisupervised learning algorithm based on a topological data analysis approach to assign functional categories to yeast proteins using similarity graphs. This new approach to analyzing biological networks yields results that are as good as or better than state of the art existing approaches.

artificial intelligence, bioinformatics, machine learning, (17 more...)

arXiv.org Machine Learning

1408.5634

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.41)

Add feedback

A Case Study in Text Mining: Interpreting Twitter Data From World Cup Tweets

Godfrey, Daniel, Johns, Caley, Meyer, Carl, Race, Shaina, Sadek, Carol

arXiv.org Machine LearningAug-21-2014

Cluster analysis is a field of data analysis that extracts underlying patterns in data. One application of cluster analysis is in text-mining, the analysis of large collections of text to find similarities between documents. We used a collection of about 30,000 tweets extracted from Twitter just before the World Cup started. A common problem with real world text data is the presence of linguistic noise. In our case it would be extraneous tweets that are unrelated to dominant themes. To combat this problem, we created an algorithm that combined the DBSCAN algorithm and a consensus matrix. This way we are left with the tweets that are related to those dominant themes. We then used cluster analysis to find those topics that the tweets describe. We clustered the tweets using k-means, a commonly used clustering algorithm, and Non-Negative Matrix Factorization (NMF) and compared the results. The two algorithms gave similar results, but NMF proved to be faster and provided more easily interpreted results. We explored our results using two visualization tools, Gephi and Wordle.

artificial intelligence, machine learning, social media, (15 more...)

arXiv.org Machine Learning

1408.5427

Country: North America > United States > North Carolina (0.46)

Genre: Research Report (0.70)

Industry:

Information Technology > Services (0.84)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

A Bi-clustering Framework for Consensus Problems

Tepper, Mariano, Sapiro, Guillermo

arXiv.org Machine LearningAug-20-2014

We consider grouping as a general characterization for problems such as clustering, community detection in networks, and multiple parametric model estimation. We are interested in merging solutions from different grouping algorithms, distilling all their good qualities into a consensus solution. In this paper, we propose a bi-clustering framework and perspective for reaching consensus in such grouping problems. In particular, this is the first time that the task of finding/fitting multiple parametric models to a dataset is formally posed as a consensus problem. We highlight the equivalence of these tasks and establish the connection with the computational Gestalt program, that seeks to provide a psychologically-inspired detection theory for visual events. We also present a simple but powerful bi-clustering algorithm, specially tuned to the nature of the problem we address, though general enough to handle many different instances inscribed within our characterization. The presentation is accompanied with diverse and extensive experimental results in clustering, community detection, and multiple parametric model estimation in image processing applications.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1137/140967325

1405.6159

Country: North America > United States (0.92)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Cluster based RBF Kernel for Support Vector Machines

Czarnecki, Wojciech Marian, Tabor, Jacek

arXiv.org Machine LearningAug-12-2014

In the classical Gaussian SVM classification we use the feature space projection transforming points to normal distributions with fixed covariance matrices (identity in the standard RBF and the covariance of the whole dataset in Mahalanobis RBF). In this paper we add additional information to Gaussian SVM by considering local geometry-dependent feature space projection. We emphasize that our approach is in fact an algorithm for a construction of the new Gaussian-type kernel. We show that better (compared to standard RBF and Mahalanobis RBF) classification results are obtained in the simple case when the space is preliminary divided by k-means into two sets and points are represented as normal distributions with a covariances calculated according to the dataset partitioning. We call the constructed method C$_k$RBF, where $k$ stands for the amount of clusters used in k-means. We show empirically on nine datasets from UCI repository that C$_2$RBF increases the stability of the grid search (measured as the probability of finding good parameters).

artificial intelligence, kernel, machine learning, (16 more...)

arXiv.org Machine Learning

1408.2869

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.87)

Add feedback

Warped Mixtures for Nonparametric Cluster Shapes

Iwata, Tomoharu, Duvenaud, David, Ghahramani, Zoubin

arXiv.org Machine LearningAug-9-2014

A mixture of Gaussians fit to a single curved or heavy-tailed cluster will report that the data contains many clusters. To produce more appropriate clusterings, we introduce a model which warps a latent mixture of Gaussians to produce nonparametric cluster shapes. The possibly low-dimensional latent mixture model allows us to summarize the properties of the high-dimensional clusters (or density manifolds) describing the data. The number of manifolds, as well as the shape and dimension of each manifold is automatically inferred. We derive a simple inference scheme for this model which analytically integrates out both the mixture parameters and the warping function. We show that our model is effective for density estimation, performs better than infinite Gaussian mixture models at recovering the true number of clusters, and produces interpretable summaries of high-dimensional datasets.

artificial intelligence, iwmm, machine learning, (17 more...)

arXiv.org Machine Learning

1408.2061

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback