AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Fast and Effective Algorithms for Symmetric Nonnegative Matrix Factorization

Borhani, Reza, Watt, Jeremy, Katsaggelos, Aggelos

arXiv.org Machine LearningSep-17-2016

Symmetric Nonnegative Matrix Factorization (SNMF) models arise naturally as simple reformulations of many standard clustering algorithms including the popular spectral clustering method. Recent work has demonstrated that an elementary instance of SNMF provides superior clustering quality compared to many classic clustering algorithms on a variety of synthetic and real world data sets. In this work, we present novel reformulations of this instance of SNMF based on the notion of variable splitting and produce two fast and effective algorithms for its optimization using i) the provably convergent Accelerated Proximal Gradient (APG) procedure and ii) a heuristic version of the Alternating Direction Method of Multipliers (ADMM) framework. Our two algorithms present an interesting tradeoff between computational speed and mathematical convergence guarantee: while the former method is provably convergent it is considerably slower than the latter approach, for which we also provide significant but less stringent mathematical proof regarding its convergence. Through extensive experiments we show not only that the efficacy of these approaches is equal to that of the state of the art SNMF algorithm, but also that the latter of our algorithms is extremely fast being one to two orders of magnitude faster in terms of total computation time than the state of the art approach, outperforming even spectral clustering in terms of computation time on large data sets.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1609.05342

Genre: Research Report > Promising Solution (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Mixture model modal clustering

Chacón, José E.

arXiv.org Machine LearningSep-15-2016

The two most extended density-based approaches to clustering are surely mixture model clustering and modal clustering. In the mixture model approach, the density is represented as a mixture and clusters are associated to the different mixture components. In modal clustering, clusters are understood as regions of high density separated from each other by zones of lower density, so that they are closely related to certain regions around the density modes. If the true density is indeed in the assumed class of mixture densities, then mixture model clustering allows to scrutinize more subtle situations than modal clustering. However, when mixture modeling is used in a nonparametric way, taking advantage of the denseness of the sieve of mixture densities to approximate any density, then the correspondence between clusters and mixture components may become questionable. In this paper we introduce two methods to adopt a modal clustering point of view after a mixture model fit. Numerous examples are provided to illustrate that mixture modeling can also be used for clustering in a nonparametric sense, as long as clusters are understood as the domains of attraction of the density modes.

artificial intelligence, machine learning, modal, (15 more...)

arXiv.org Machine Learning

1609.04721

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

The LICORS Cabinet: Nonparametric Algorithms for Spatio-temporal Prediction

Montanez, George D., Shalizi, Cosma Rohilla

arXiv.org Machine LearningSep-14-2016

Spatio-temporal data is intrinsically high dimensional, so unsupervised modeling is only feasible if we can exploit structure in the process. When the dynamics are local in both space and time, this structure can be exploited by splitting the global field into many lower-dimensional "light cones". We review light cone decompositions for predictive state reconstruction, introducing three simple light cone algorithms. These methods allow for tractable inference of spatio-temporal data, such as full-frame video. The algorithms make few assumptions on the underlying process yet have good predictive performance and can provide distributions over spatio-temporal data, enabling sophisticated probabilistic inference.

artificial intelligence, light cone, machine learning, (15 more...)

arXiv.org Machine Learning

1506.02686

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

What is important to know about K Means Clustering in R?

#artificialintelligenceSep-12-2016, 22:00:44 GMT

I don't think K-means clustering in R has any special meaning, whatever package you use the basic K-means algorithm remains the same.

artificial intelligence, machine learning, means clustering, (1 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.86)

Add feedback

On Generation of Time-based Label Refinements

Tax, Niek, Alasgarov, Emin, Sidorova, Natalia, Haakma, Reinder

arXiv.org Machine LearningSep-12-2016

Process mining is a research field focused on the analysis of event data with the aim of extracting insights in processes. Applying process mining techniques on data from smart home environments has the potential to provide valuable insights in (un)healthy habits and to contribute to ambient assisted living solutions. Finding the right event labels to enable application of process mining techniques is however far from trivial, as simply using the triggering sensor as the label for sensor events results in uninformative models that allow for too much behavior (overgeneralizing). Refinements of sensor level event labels suggested by domain experts have shown to enable discovery of more precise and insightful process models. However, there exist no automated approach to generate refinements of event labels in the context of process mining. In this paper we propose a framework for automated generation of label refinements based on the time attribute of events. We show on a case study with real life smart home event data that behaviorally more specific, and therefore more insightful, process models can be found by using automatically generated refined labels in process discovery.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1609.03333

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine (1.00)
Information Technology > Smart Houses & Appliances (0.57)
Materials > Metals & Mining (0.54)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

A Greedy Algorithm to Cluster Specialists

Arnold, Sébastien

arXiv.org Machine LearningSep-12-2016

Several recent deep neural networks experiments leverage the generalist-specialist paradigm for classification. However, no formal study compared the performance of different clustering algorithms for class assignment. In this paper we perform such a study, suggest slight modifications to the clustering procedures, and propose a novel algorithm designed to optimize the performance of of the specialist-generalist classification system. Our experiments on the CIFAR-10 and CIFAR-100 datasets allow us to investigate situations for varying number of classes on similar data. We find that our \emph{greedy pairs} clustering algorithm consistently outperforms other alternatives, while the choice of the confusion matrix has little impact on the final performance.

artificial intelligence, machine learning, specialist, (15 more...)

arXiv.org Machine Learning

1609.03666

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.74)

Add feedback

A Simple Approach to Sparse Clustering

Arias-Castro, Ery, Pu, Xiao

arXiv.org Machine LearningSep-11-2016

Consider the problem of sparse clustering, where it is assumed that only a subset of the features are useful for clustering purposes. In the framework of the COSA method of Friedman and Meulman, subsequently improved in the form of the Sparse K-means method of Witten and Tibshirani, a natural and simpler hill-climbing approach is introduced. The new method is shown to be competitive with these two methods and others. Keywords: Sparse Clustering, Hill-climbing, High-dimensional, Feature Selection 1. Introduction Consider a typical setting for clusteringn items based on pairwise dissimilarities, withδ(i,j) denoting the dissimilarity between itemsi,j [n ] {1,...,n } . For concreteness, we assume thatδ(i,j) 0 and δ(i,i) 0 for all i,j [n ] . In principle, if we want to delineateκ clusters, the goal is (for example) to minimize the average within-cluster dissimilarity. Let C n κ denote the class of clusterings ofn items intoκ groups. For C C n κ, its average within-cluster dissimilarity is defined as [C ] k [κ ] 1 C 1 (k) i,j C 1 (k)δ(i,j). If under the Euclidean setting, we further define cluster centers µ k 1 n i C 1 (k)x i with k [κ ], (2) then the within-cluster dissimilarity can be rewritten as follows, [C ] k [κ ] 1 C 1 (k) i,j C 1 (k) x i x j 2 k [κ ] i C 1 (k) x i µ k 2 . The resulting optimization problem is the following: Given (δ(i,j) i,j [n ]), minimize [C ] over C C n κ .

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

doi: 10.1016/j.csda.2016.08.003

1602.07277

Country: North America > United States > California (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.47)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

scikit-learn and Game of Thrones - DZone Big Data

#artificialintelligenceSep-10-2016, 18:00:39 GMT

In my last post, I showed how to find similar Game of Thrones episodes based on the characters that appear in different episodes. This allowed us to find similar episodes on an episode by episode basis, but I was curious whether there were groups of similar episodes that we could identify. A clustering algorithm groups similar documents together, where similarity is based on calculating a'distance' between documents. Documents separated by a small distance would be in the same cluster, whereas if there's a large distance between episodes then they'd probably be in different clusters. The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares.

artificial intelligence, data mining, machine learning, (8 more...)

#artificialintelligence

Industry:

Media > Television (0.64)
Leisure & Entertainment (0.64)

Technology:

Information Technology > Data Science > Data Mining (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.60)

Add feedback

Distributed Processing of Biosignal-Database for Emotion Recognition with Mahout

Kollia, Varvara, Elibol, Oguz H.

arXiv.org Machine LearningSep-8-2016

There are many popular emotion definitions and models, both in terms of discrete emotion subsets, as well as mappings in two-and three-dimensional spaces. In this work, we assume a 3D emotional model. With increasing interest in the area, new and large datasets are being collected, enabling new insights to be discovered in the area. These datasets necessitate distributed processing for enhanced scalability and performance. Popular distributed machine learning libraries can augment the process of training accurate classifiers offline, to build prediction models based on large amounts of data. We used Mahout on distributed mode to train a random forest classifier, on the DEAP dataset. Using a distributed approach allowed us to both process the data in reasonable time and conduct many iterations to experiment with different model parameters and convergence criteria.

artificial intelligence, machine learning, random forest classifier, (13 more...)

arXiv.org Machine Learning

1609.02631

Genre: Research Report (1.00)

Industry: Health & Medicine (0.33)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)

Add feedback

Functorial Hierarchical Clustering with Overlaps

Culbertson, Jared, Guralnik, Dan P., Stiller, Peter F.

arXiv.org Machine LearningSep-8-2016

This work draws its inspiration from three important sources of research on dissimilarity-based clustering and intertwines those three threads into a consistent principled functorial theory of clustering. Those three are the overlapping clustering of Jardine and Sibson, the functorial approach of Carlsson and Mémoli to partition-based clustering, and the Isbell/Dress school's study of injective envelopes. Carlsson and Mémoli introduce the idea of viewing clustering methods as functors from a category of metric spaces to a category of clusters, with functoriality subsuming many desirable properties. Our first series of results extends their theory of functorial clustering schemes to methods that allow overlapping clusters in the spirit of Jardine and Sibson. This obviates some of the unpleasant effects of chaining that occur, for example with single-linkage clustering. We prove an equivalence between these general overlapping clustering functors and projections of weight spaces to what we term clustering domains, by focusing on the order structure determined by the morphisms. As a specific application of this machinery, we are able to prove that there are no functorial projections to cut metrics, or even to tree metrics. Finally, although we focus less on the construction of clustering methods (clustering domains) derived from injective envelopes, we lay out some preliminary results, that hopefully will give a feel for how the third leg of the stool comes into play.

artificial intelligence, category, machine learning, (15 more...)

arXiv.org Machine Learning

1609.02513

Country: North America > United States > California (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback