AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

On Robustness of Kernel Clustering

Yan, Bowei, Sarkar, Purnamrita

arXiv.org Machine LearningDec-1-2016

Clustering is an important problem which is prevalent in a variety of real world problems. One of the first and widely applied clustering algorithms is k-means, which was named by James MacQueen [15], but was proposed by Hugo Steinhaus [23] even before. Despite being half a century old, k-means has been widely used and analyzed under various settings. One major drawback of k-means is its incapability to separate clusters that are non-linearly separated. This can be alleviated by mapping the data to a high dimensional feature space and do clustering on top of the feature space [21, 9, 12], which is generally called kernel-based methods. For instance, the widely-used spectral clustering [22, 17] is an algorithm to calculate top eigenvectors of a kernel matrix of affinities, followed by a k-means on the top r eigenvectors. The consistency of spectral clustering is analyzed by [25].

artificial intelligence, machine learning, outlier, (17 more...)

arXiv.org Machine Learning

1606.01869

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

Challenge of the Week - Solution from one of the Participants

@machinelearnbotNov-30-2016, 23:45:03 GMT

The number of clusters steadily decreases (7 at 20s [ 167 iterations], 6 at 40s [ 333 iterations], 5 at the end [500 iterations]) Around the middle of the video you see that the clusters appear to be fairly stable, however more iterations result in a significant change in cluster location and number. A local minimum was detected, however it was not the global minimum. One cluster is especially small (and potentially suspect) at the end of the iterations in this simulation One of the clusters is unstable: points are exchanging between it and a nearby cluster - further iterations may reduce the number of clusters through consolidation. There is a lot more movement of points within the z dimension than along x or y. This would be worth investigating as a potential issue with the clustering algorithm or visualization - or perhaps something interesting is going on!

artificial intelligence, iteration, machine learning, (14 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)

Add feedback

On the Existence of Synchrostates in Multichannel EEG Signals during Face-perception Tasks

Jamal, Wasifa, Das, Saptarshi, Maharatna, Koushik, Apicella, Fabio, Chronaki, Georgia, Sicca, Federico, Cohen, David, Muratori, Filippo

arXiv.org Machine LearningNov-29-2016

Phase synchronisation in multichannel EEG is known as the manifestation of functional brain connectivity. Traditional phase synchronisation studies are mostly based on time average synchrony measures hence do not preserve the temporal evolution of the phase difference. Here we propose a new method to show the existence of a small set of unique phase synchronised patterns or "states" in multi-channel EEG recordings, each "state" being stable of the order of ms, from typical and pathological subjects during face perception tasks. The proposed methodology bridges the concepts of EEG microstates and phase synchronisation in time and frequency domain respectively. The analysis is reported for four groups of children including typical, Autism Spectrum Disorder (ASD), low and high anxiety subjects - a total of 44 subjects. In all cases, we observe consistent existence of these states - termed as synchrostates - within specific cognition related frequency bands (beta and gamma bands), though the topographies of these synchrostates differ for different subject groups with different pathological conditions. The inter-synchrostate switching follows a well-defined sequence capturing the underlying inter-electrode phase relation dynamics in stimulus- and person-centric manner. Our study is motivated from the well-known EEG microstate exhibiting stable potential maps over the scalp. However, here we report a similar observation of quasi-stable phase synchronised states in multichannel EEG. The existence of the synchrostates coupled with their unique switching sequence characteristics could be considered as a potentially new field over contemporary EEG phase synchronisation studies.

artificial intelligence, machine learning, synchrostate, (13 more...)

arXiv.org Machine Learning

doi: 10.1088/2057-1976/1/1/015002

1611.09791

Country: Europe (0.93)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology > Autism (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Introduction to Machine Learning for Developers

#artificialintelligenceNov-25-2016, 17:35:13 GMT

Today's developers often hear about leveraging machine learning algorithms in order to build more intelligent applications, but many don't know where to start. One of the most important aspects of developing smart applications is to understand the underlying machine learning models, even if you aren't the person building them. Whether you are integrating a recommendation system into your app or building a chat bot, this guide will help you get started in understanding the basics of machine learning. This introduction to machine learning and list of resources is adapted from my October 2016 talk at ACT-W, a women's tech conference. Machine learning studies computer algorithms for learning to do stuff.

artificial intelligence, k-means example, machine learning, (14 more...)

#artificialintelligence

Genre: Instructional Material (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.32)

Add feedback

Clustering with Same-Cluster Queries

Ashtiani, Hassan, Kushagra, Shrinu, Ben-David, Shai

arXiv.org Machine LearningNov-22-2016

We propose a framework for Semi-Supervised Active Clustering framework (SSAC), where the learner is allowed to interact with a domain expert, asking whether two given instances belong to the same cluster or not. We study the query and computational complexity of clustering in this framework. We consider a setting where the expert conforms to a center-based clustering with a notion of margin. We show that there is a trade off between computational complexity and query complexity; We prove that for the case of $k$-means clustering (i.e., when the expert conforms to a solution of $k$-means), having access to relatively few such queries allows efficient solutions to otherwise NP hard problems. In particular, we provide a probabilistic polynomial-time (BPP) algorithm for clustering in this setting that asks $O\big(k^2\log k + k\log n)$ same-cluster queries and runs with time complexity $O\big(kn\log n)$ (where $k$ is the number of clusters and $n$ is the number of instances). The algorithm succeeds with high probability for data satisfying margin conditions under which, without queries, we show that the problem is NP hard. We also prove a lower bound on the number of queries needed to have a computationally efficient clustering algorithm in this setting.

artificial intelligence, machine learning, query, (18 more...)

arXiv.org Machine Learning

1606.02404

Country:

North America > United States (0.28)
North America > Canada (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Fast clustering algorithms for massive datasets

@machinelearnbotNov-14-2016, 04:33:39 GMT

How do you represent these keywords, with their cluster structure determined by d(A, B), in a nice graph? 10 million keywords would fit in a 3,000 x 3,000 pixels image. For those interested in graphical representations, see the Fruchterman and Rheingold algorithm, extensively used to produce graphs similar to the one below. Note that its computational complexity is O(n 3) though, so we need to very significantly improve it for this keyword clustering application - including the graphical representation. The graphical representation could be a raster image with millions of pixels, like a heat map where color represents category and, when you point to a pixel, a keyword value shows up (rather than a vector image with dozens of nodes, see graph below). Neighboring pixels would represent strongly related keywords.

artificial intelligence, data mining, machine learning, (5 more...)

@machinelearnbot

Technology:

Information Technology > Data Science > Data Mining (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.40)

Add feedback

A-Ward_p\b{eta}: Effective hierarchical clustering using the Minkowski metric and a fast k -means initialisation

de Amorim, Renato Cordeiro, Makarenkov, Vladimir, Mirkin, Boris

arXiv.org Machine LearningNov-3-2016

In this paper we make two novel contributions to hierarchical clustering. First, we introduce an anomalous pattern initialisation method for hierarchical clustering algorithms, called A-Ward, capable of substantially reducing the time they take to converge. This method generates an initial partition with a sufficiently large number of clusters. This allows the cluster merging process to start from this partition rather than from a trivial partition composed solely of singletons. Our second contribution is an extension of the Ward and Ward p algorithms to the situation where the feature weight exponent can differ from the exponent of the Minkowski distance. This new method, called A-Ward p\b{eta} , is able to generate a much wider variety of clustering solutions. We also demonstrate that its parameters can be estimated reasonably well by using a cluster validity index. We perform numerous experiments using data sets with two types of noise, insertion of noise features and blurring within-cluster values of some features. These experiments allow us to conclude: (i) our anomalous pattern initialisation method does indeed reduce the time a hierarchical clustering algorithm takes to complete, without negatively impacting its cluster recovery ability; (ii) A-Ward p\b{eta} provides better cluster recovery than both Ward and Ward p.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1016/j.ins.2016.07.076

1611.0106

Country:

Europe > United Kingdom (0.28)
North America > United States (0.28)
North America > Canada (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Covariate-assisted spectral clustering

Binkiewicz, Norbert, Vogelstein, Joshua T., Rohe, Karl

arXiv.org Machine LearningOct-30-2016

Biological and social systems consist of myriad interacting units. The interactions can be represented in the form of a graph or network. Measurements of these graphs can reveal the underlying structure of these interactions, which provides insight into the systems that generated the graphs. Moreover, in applications such as connectomics, social networks, and genomics, graph data are accompanied by contextualizing measures on each node. We utilize these node covariates to help uncover latent communities in a graph, using a modification of spectral clustering. Statistical guarantees are provided under a joint mixture model that we call the node-contextualized stochastic blockmodel, including a bound on the mis-clustering rate. The bound is used to derive conditions for achieving perfect clustering. For most simulated cases, covariate-assisted spectral clustering yields results superior to regularized spectral clustering without node covariates and to an adaptation of canonical correlation analysis. We apply our clustering method to large brain graphs derived from diffusion MRI data, using the node locations or neurological region membership as covariates. In both cases, covariate-assisted spectral clustering yields clusters that are easier to interpret neurologically.

artificial intelligence, machine learning, spectral, (17 more...)

arXiv.org Machine Learning

doi: 10.1093/biomet/asx008

1411.2158

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Decentralized Clustering and Linking by Networked Agents

Khawatmi, Sahar, Sayed, Ali H., Zoubir, Abdelhak M.

arXiv.org Machine LearningOct-28-2016

We consider the problem of decentralized clustering and estimation over multi-task networks, where agents infer and track different models of interest. The agents do not know beforehand which model is generating their own data. They also do not know which agents in their neighborhood belong to the same cluster. We propose a decentralized clustering algorithm aimed at identifying and forming clusters of agents of similar objectives, and at guiding cooperation to enhance the inference performance. One key feature of the proposed technique is the integration of the learning and clustering tasks into a single strategy. We analyze the performance of the procedure and show that the error probabilities of types I and II decay exponentially to zero with the step-size parameter. While links between agents following different objectives are ignored in the clustering process, we nevertheless show how to exploit these links to relay critical information across the network for enhanced performance. Simulation results illustrate the performance of the proposed method in comparison to other useful techniques.

agent, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2017.2692736

1610.09112

Country:

Europe (1.00)
North America > United States (0.67)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

PCM and APCM Revisited: An Uncertainty Perspective

Hou, Peixin, Deng, Hao, Yue, Jiguang, Liu, Shuguang

arXiv.org Machine LearningOct-27-2016

In this paper, we take a new look at the possibilistic c-means (PCM) and adaptive PCM (APCM) clustering algorithms from the perspective of uncertainty. This new perspective offers us insights into the clustering process, and also provides us greater degree of flexibility. We analyze the clustering behavior of PCM-based algorithms and introduce parameters $\sigma_v$ and $\alpha$ to characterize uncertainty of estimated bandwidth and noise level of the dataset respectively. Then uncertainty (fuzziness) of membership values caused by uncertainty of the estimated bandwidth parameter is modeled by a conditional fuzzy set, which is a new formulation of the type-2 fuzzy set. Experiments show that parameters $\sigma_v$ and $\alpha$ make the clustering process more easy to control, and main features of PCM and APCM are unified in this new clustering framework (UPCM). More specifically, UPCM reduces to PCM when we set a small $\alpha$ or a large $\sigma_v$, and UPCM reduces to APCM when clusters are confined in their physical clusters and possible cluster elimination are ensured. Finally we present further researches of this paper.

artificial intelligence, fuzzy logic, machine learning, (17 more...)

arXiv.org Machine Learning

1610.08624

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback