AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

On the Consistency of Quick Shift

Jiang, Heinrich

Neural Information Processing SystemsDec-31-2017

Quick Shift is a popular mode-seeking and clustering algorithm. We present finite sample statistical consistency guarantees for Quick Shift on mode and cluster recovery under mild distributional assumptions. We then apply our results to construct a consistent modal regression algorithm.

artificial intelligence, machine learning, quick shift, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Deep Subspace Clustering Networks

Ji, Pan, Zhang, Tong, Li, Hongdong, Salzmann, Mathieu, Reid, Ian

Neural Information Processing SystemsDec-31-2017

We present a novel deep neural network architecture for unsupervised subspace clustering. This architecture is built upon deep auto-encoders, which non-linearly map the input data into a latent space. Our key idea is to introduce a novel self-expressive layer between the encoder and the decoder to mimic the "self-expressiveness" property that has proven effective in traditional subspace clustering. Being differentiable, our new self-expressive layer provides a simple but effective way to learn pairwise affinities between all data points through a standard back-propagation procedure. Being nonlinear, our neural-network based method is able to cluster data points having complex (often nonlinear) structures. We further propose pre-training and fine-tuning strategies that let us effectively learn the parameters of our subspace clustering networks. Our experiments show that the proposed method significantly outperforms the state-of-the-art unsupervised subspace clustering methods.

artificial intelligence, machine learning, subspace, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The information bottleneck and geometric clustering

Strouse, D J, Schwab, David J

arXiv.org Machine LearningDec-27-2017

The information bottleneck (IB) approach to clustering takes a joint distribution $P\!\left(X,Y\right)$ and maps the data $X$ to cluster labels $T$ which retain maximal information about $Y$ (Tishby et al., 1999). This objective results in an algorithm that clusters data points based upon the similarity of their conditional distributions $P\!\left(Y\mid X\right)$. This is in contrast to classic "geometric clustering" algorithms such as $k$-means and gaussian mixture models (GMMs) which take a set of observed data points $\left\{ \mathbf{x}_{i}\right\}_{i=1:N}$ and cluster them based upon their geometric (typically Euclidean) distance from one another. Here, we show how to use the deterministic information bottleneck (DIB) (Strouse and Schwab, 2017), a variant of IB, to perform geometric clustering, by choosing cluster labels that preserve information about data point location on a smoothed dataset. We also introduce a novel intuitive method to choose the number of clusters, via kinks in the information curve. We apply this approach to a variety of simple clustering problems, showing that DIB with our model selection procedure recovers the generative cluster labels. We also show that, for one simple case, DIB interpolates between the cluster boundaries of GMMs and $k$-means in the large data limit. Thus, our IB approach to clustering also provides an information-theoretic perspective on these classic algorithms.

artificial intelligence, information, machine learning, (15 more...)

arXiv.org Machine Learning

1712.09657

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Deep Unsupervised Clustering Using Mixture of Autoencoders

Zhang, Dejiao, Sun, Yifan, Eriksson, Brian, Balzano, Laura

arXiv.org Machine LearningDec-26-2017

Unsupervised clustering is one of the most fundamental challenges in machine learning. A popular hypothesis is that data are generated from a union of low-dimensional nonlinear manifolds; thus an approach to clustering is identifying and separating these manifolds. In this paper, we present a novel approach to solve this problem by using a mixture of autoencoders. Our model consists of two parts: 1) a collection of autoencoders where each autoencoder learns the underlying manifold of a group of similar objects, and 2) a mixture assignment neural network, which takes the concatenated latent vectors from the autoencoders as input and infers the distribution over clusters. By jointly optimizing the two parts, we simultaneously assign data to clusters and learn the underlying manifolds of each cluster.

artificial intelligence, autoencoder, machine learning, (15 more...)

arXiv.org Machine Learning

1712.07788

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Merging $K$-means with hierarchical clustering for identifying general-shaped groups

Peterson, Anna D., Ghosh, Arka P., Maitra, Ranjan

arXiv.org Machine LearningDec-23-2017

Clustering partitions a dataset such that observations placed together in a group are similar but different from those in other groups. Hierarchical and $K$-means clustering are two approaches but have different strengths and weaknesses. For instance, hierarchical clustering identifies groups in a tree-like structure but suffers from computational complexity in large datasets while $K$-means clustering is efficient but designed to identify homogeneous spherically-shaped clusters. We present a hybrid non-parametric clustering approach that amalgamates the two methods to identify general-shaped clusters and that can be applied to larger datasets. Specifically, we first partition the dataset into spherical groups using $K$-means. We next merge these groups using hierarchical methods with a data-driven distance measure as a stopping criterion. Our proposal has the potential to reveal groups with general shapes and structure in a dataset. We demonstrate good performance on several simulated and real datasets.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1002/sta4.172

1712.08786

Country:

Europe > Italy (1.00)
North America > United States (0.93)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

On the Consistency of Quick Shift

Jiang, Heinrich

arXiv.org Machine LearningDec-23-2017

artificial intelligence, machine learning, quick shift, (18 more...)

arXiv.org Machine Learning

1710.10646

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Model-Based Clustering of Nonparametric Weighted Networks

Agarwal, Amal, Xue, Lingzhou

arXiv.org Machine LearningDec-21-2017

Water pollution is a major global environmental problem, and it poses a great environmental risk to public health and biological diversity. This work is motivated by assessing the potential environmental threat of coal mining through increased sulfate concentrations in river networks, which do not belong to any simple parametric distribution. However, existing network models mainly focus on binary or discrete networks and weighted networks with known parametric weight distributions. We propose a principled nonparametric weighted network model based on exponential-family random graph models and local likelihood estimation and study its model-based clustering with application to large-scale water pollution network analysis. We do not require any parametric distribution assumption on network weights. The proposed method greatly extends the methodology and applicability of statistical network models. Furthermore, it is scalable to large and complex networks in large-scale environmental studies and geoscientific research. The power of our proposed methods is demonstrated in simulation studies.

artificial intelligence, elbo, machine learning, (15 more...)

arXiv.org Machine Learning

1712.078

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area (0.93)
Law > Environmental Law (0.88)
Water & Waste Management > Water Management > Water Supplies & Services (0.70)
Materials > Metals & Mining > Coal (0.49)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.86)

Add feedback

Unsupervised learning of dynamical and molecular similarity using variance minimization

Husic, Brooke E., Pande, Vijay S.

arXiv.org Machine LearningDec-20-2017

In this report, we present an unsupervised machine learning method for determining groups of molecular systems according to similarity in their dynamics or structures using Ward's minimum variance objective function. We first apply the minimum variance clustering to a set of simulated tripeptides using the information theoretic Jensen-Shannon divergence between Markovian transition matrices in order to gain insight into how point mutations affect protein dynamics. Then, we extend the method to partition two chemoinformatic datasets according to structural similarity to motivate a train/validation/test split for supervised learning that avoids overfitting.

artificial intelligence, machine learning, objective function, (16 more...)

arXiv.org Machine Learning

1712.07704

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Model-Based Clustering of Time-Evolving Networks through Temporal Exponential-Family Random Graph Models

Lee, Kevin H., Xue, Lingzhou, Hunter, David R.

arXiv.org Machine LearningDec-20-2017

Dynamic networks are a general language for describing time-evolving complex systems, and discrete time network models provide an emerging statistical technique for various applications. It is a fundamental research question to detect the community structure in time-evolving networks. However, due to significant computational challenges and difficulties in modeling communities of time-evolving networks, there is little progress in the current literature to effectively find communities in time-evolving networks. In this work, we propose a novel model-based clustering framework for time-evolving networks based on discrete time exponential-family random graph models. To choose the number of communities, we use conditional likelihood to construct an effective model selection criterion. Furthermore, we propose an efficient variational expectation-maximization (EM) algorithm to find approximate maximum likelihood estimates of network parameters and mixing proportions. By using variational methods and minorization-maximization (MM) techniques, our method has appealing scalability for large-scale time-evolving networks. The power of our method is demonstrated in simulation studies and empirical applications to international trade networks and the collaboration networks of a large American research university.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Machine Learning

1712.07325

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

A generalised framework for detailed classification of swimming paths inside the Morris Water Maze

Vouros, Avgoustinos, Gehring, Tiago V., Szydlowska, Kinga, Janusz, Artur, Croucher, Mike, Lukasiuk, Katarzyna, Konopka, Witold, Sandi, Carmen, Tu, Zehai, Vasilaki, Eleni

arXiv.org Machine LearningDec-18-2017

The Morris Water Maze is commonly used in behavioural neuroscience for the study of spatial learning with rodents. Over the years, various methods of analysing rodent data collected in this task have been proposed. These methods span from classical performance measurements (e.g. escape latency, rodent speed, quadrant preference) to more sophisticated methods of categorisation which classify the animal swimming path into behavioural classes known as strategies. Classification techniques provide additional insight in relation to the actual animal behaviours but still only a limited amount of studies utilise them mainly because they highly depend on machine learning knowledge. We have previously demonstrated that the animals implement various strategies and by classifying whole trajectories can lead to the loss of important information. In this work, we developed a generalised and robust classification methodology which implements majority voting to boost the classification performance and successfully nullify the need of manual tuning. Based on this framework, we built a complete software, capable of performing the full analysis described in this paper. The software provides an easy to use graphical user interface (GUI) through which users can enter their trajectory data, segment and label them and finally generate reports and figures of the results.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Machine Learning

1711.07446

Country: Europe (0.67)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.71)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback