Goto

Collaborating Authors

 Clustering


Fast and Effective Algorithms for Symmetric Nonnegative Matrix Factorization

arXiv.org Machine Learning

Symmetric Nonnegative Matrix Factorization (SNMF) models arise naturally as simple reformulations of many standard clustering algorithms including the popular spectral clustering method. Recent work has demonstrated that an elementary instance of SNMF provides superior clustering quality compared to many classic clustering algorithms on a variety of synthetic and real world data sets. In this work, we present novel reformulations of this instance of SNMF based on the notion of variable splitting and produce two fast and effective algorithms for its optimization using i) the provably convergent Accelerated Proximal Gradient (APG) procedure and ii) a heuristic version of the Alternating Direction Method of Multipliers (ADMM) framework. Our two algorithms present an interesting tradeoff between computational speed and mathematical convergence guarantee: while the former method is provably convergent it is considerably slower than the latter approach, for which we also provide significant but less stringent mathematical proof regarding its convergence. Through extensive experiments we show not only that the efficacy of these approaches is equal to that of the state of the art SNMF algorithm, but also that the latter of our algorithms is extremely fast being one to two orders of magnitude faster in terms of total computation time than the state of the art approach, outperforming even spectral clustering in terms of computation time on large data sets.


Mixture model modal clustering

arXiv.org Machine Learning

The two most extended density-based approaches to clustering are surely mixture model clustering and modal clustering. In the mixture model approach, the density is represented as a mixture and clusters are associated to the different mixture components. In modal clustering, clusters are understood as regions of high density separated from each other by zones of lower density, so that they are closely related to certain regions around the density modes. If the true density is indeed in the assumed class of mixture densities, then mixture model clustering allows to scrutinize more subtle situations than modal clustering. However, when mixture modeling is used in a nonparametric way, taking advantage of the denseness of the sieve of mixture densities to approximate any density, then the correspondence between clusters and mixture components may become questionable. In this paper we introduce two methods to adopt a modal clustering point of view after a mixture model fit. Numerous examples are provided to illustrate that mixture modeling can also be used for clustering in a nonparametric sense, as long as clusters are understood as the domains of attraction of the density modes.


The LICORS Cabinet: Nonparametric Algorithms for Spatio-temporal Prediction

arXiv.org Machine Learning

Spatio-temporal data is intrinsically high dimensional, so unsupervised modeling is only feasible if we can exploit structure in the process. When the dynamics are local in both space and time, this structure can be exploited by splitting the global field into many lower-dimensional "light cones". We review light cone decompositions for predictive state reconstruction, introducing three simple light cone algorithms. These methods allow for tractable inference of spatio-temporal data, such as full-frame video. The algorithms make few assumptions on the underlying process yet have good predictive performance and can provide distributions over spatio-temporal data, enabling sophisticated probabilistic inference.


What is important to know about K Means Clustering in R?

#artificialintelligence

I don't think K-means clustering in R has any special meaning, whatever package you use the basic K-means algorithm remains the same.


On Generation of Time-based Label Refinements

arXiv.org Machine Learning

Process mining is a research field focused on the analysis of event data with the aim of extracting insights in processes. Applying process mining techniques on data from smart home environments has the potential to provide valuable insights in (un)healthy habits and to contribute to ambient assisted living solutions. Finding the right event labels to enable application of process mining techniques is however far from trivial, as simply using the triggering sensor as the label for sensor events results in uninformative models that allow for too much behavior (overgeneralizing). Refinements of sensor level event labels suggested by domain experts have shown to enable discovery of more precise and insightful process models. However, there exist no automated approach to generate refinements of event labels in the context of process mining. In this paper we propose a framework for automated generation of label refinements based on the time attribute of events. We show on a case study with real life smart home event data that behaviorally more specific, and therefore more insightful, process models can be found by using automatically generated refined labels in process discovery.


A Greedy Algorithm to Cluster Specialists

arXiv.org Machine Learning

Several recent deep neural networks experiments leverage the generalist-specialist paradigm for classification. However, no formal study compared the performance of different clustering algorithms for class assignment. In this paper we perform such a study, suggest slight modifications to the clustering procedures, and propose a novel algorithm designed to optimize the performance of of the specialist-generalist classification system. Our experiments on the CIFAR-10 and CIFAR-100 datasets allow us to investigate situations for varying number of classes on similar data. We find that our \emph{greedy pairs} clustering algorithm consistently outperforms other alternatives, while the choice of the confusion matrix has little impact on the final performance.


A Simple Approach to Sparse Clustering

arXiv.org Machine Learning

Consider the problem of sparse clustering, where it is assumed that only a subset of the features are useful for clustering purposes. In the framework of the COSA method of Friedman and Meulman, subsequently improved in the form of the Sparse K-means method of Witten and Tibshirani, a natural and simpler hill-climbing approach is introduced. The new method is shown to be competitive with these two methods and others. Keywords: Sparse Clustering, Hill-climbing, High-dimensional, Feature Selection 1. Introduction Consider a typical setting for clusteringn items based on pairwise dissimilarities, withδ(i,j) denoting the dissimilarity between itemsi,j [n ] {1,...,n } . For concreteness, we assume thatδ(i,j) 0 and δ(i,i) 0 for all i,j [n ] . In principle, if we want to delineateκ clusters, the goal is (for example) to minimize the average within-cluster dissimilarity. Let C n κ denote the class of clusterings ofn items intoκ groups. For C C n κ, its average within-cluster dissimilarity is defined as [C ] k [κ ] 1 C 1 (k) i,j C 1 (k)δ(i,j). If under the Euclidean setting, we further define cluster centers µ k 1 n i C 1 (k)x i with k [κ ], (2) then the within-cluster dissimilarity can be rewritten as follows, [C ] k [κ ] 1 C 1 (k) i,j C 1 (k) x i x j 2 k [κ ] i C 1 (k) x i µ k 2 . The resulting optimization problem is the following: Given (δ(i,j) i,j [n ]), minimize [C ] over C C n κ .


scikit-learn and Game of Thrones - DZone Big Data

#artificialintelligence

In my last post, I showed how to find similar Game of Thrones episodes based on the characters that appear in different episodes. This allowed us to find similar episodes on an episode by episode basis, but I was curious whether there were groups of similar episodes that we could identify. A clustering algorithm groups similar documents together, where similarity is based on calculating a'distance' between documents. Documents separated by a small distance would be in the same cluster, whereas if there's a large distance between episodes then they'd probably be in different clusters. The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares.


Distributed Processing of Biosignal-Database for Emotion Recognition with Mahout

arXiv.org Machine Learning

There are many popular emotion definitions and models, both in terms of discrete emotion subsets, as well as mappings in two-and three-dimensional spaces. In this work, we assume a 3D emotional model. With increasing interest in the area, new and large datasets are being collected, enabling new insights to be discovered in the area. These datasets necessitate distributed processing for enhanced scalability and performance. Popular distributed machine learning libraries can augment the process of training accurate classifiers offline, to build prediction models based on large amounts of data. We used Mahout on distributed mode to train a random forest classifier, on the DEAP dataset. Using a distributed approach allowed us to both process the data in reasonable time and conduct many iterations to experiment with different model parameters and convergence criteria.


Functorial Hierarchical Clustering with Overlaps

arXiv.org Machine Learning

This work draws its inspiration from three important sources of research on dissimilarity-based clustering and intertwines those three threads into a consistent principled functorial theory of clustering. Those three are the overlapping clustering of Jardine and Sibson, the functorial approach of Carlsson and Mémoli to partition-based clustering, and the Isbell/Dress school's study of injective envelopes. Carlsson and Mémoli introduce the idea of viewing clustering methods as functors from a category of metric spaces to a category of clusters, with functoriality subsuming many desirable properties. Our first series of results extends their theory of functorial clustering schemes to methods that allow overlapping clusters in the spirit of Jardine and Sibson. This obviates some of the unpleasant effects of chaining that occur, for example with single-linkage clustering. We prove an equivalence between these general overlapping clustering functors and projections of weight spaces to what we term clustering domains, by focusing on the order structure determined by the morphisms. As a specific application of this machinery, we are able to prove that there are no functorial projections to cut metrics, or even to tree metrics. Finally, although we focus less on the construction of clustering methods (clustering domains) derived from injective envelopes, we lay out some preliminary results, that hopefully will give a feel for how the third leg of the stool comes into play.