AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Adaptive Nonparametric Variational Autoencoder

Zhao, Tingting, Wang, Zifeng, Masoomi, Aria, Dy, Jennifer G.

arXiv.org Machine LearningJun-7-2019

Clustering is used to find structure in unlabeled data by grouping similar objects together. Cluster analysis depends on the definition of similarity in the feature space. In this paper, we propose an Adaptive Nonparametric Variational Autoencoder (AdapVAE) to perform end-to-end feature learning from raw data jointly with cluster membership learning through a Nonparametric Bayesian modeling framework with deep neural networks. It has the advantage of avoiding pre-definition of similarity or feature engineering. Our model relaxes the constraint of fixing the number of clusters in advance by assigning a Dirichlet Process prior on the latent representation in a low-dimensional feature space. It can adaptively detect novel clusters when new data arrives based on a learned model from historical data in an online unsupervised learning setting. We develop a joint online variational inference algorithm to learn feature representations and cluster assignments via iteratively optimizing the evidence lower bound (ELBO). Our experimental results demonstrate the capacity of our modelling framework to learn the number of clusters automatically using data, the flexibility to detect novel clusters with emerging data adaptively, the ability of high quality reconstruction and generation of samples without supervised information and the improvement over state-of-the-art end-to-end clustering methods in terms of accuracy on both image and text corpora benchmark datasets.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Machine Learning

1906.03288

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Data Sampling for Graph Based Unsupervised Learning: Convex and Greedy Optimization

Vahidian, Saeed, Cloninger, Alexander, Mirzasoleiman, Baharan

arXiv.org Machine LearningJun-7-2019

In a number of situations, collecting a function value for every data point may be prohibitively expensive, and random sampling ignores any structure in the underlying data. We introduce a scalable optimization algorithm with no correction steps (in contrast to Frank-Wolfe and its variants), a variant of gradient ascent for coreset selection in graphs, that greedily selects a weighted subset of vertices that are deemed most important to sample. Our algorithm estimates the mean of the function by taking a weighted sum only at these vertices, and we provably bound the estimation error in terms of the location and weights of the selected vertices in the graph. In addition, we consider the case where nodes have different selection costs and provide bounds on the quality of the low-cost selected coresets. We demonstrate the benefits of our algorithm on point clouds and structured graphs, as well as sensor placement where the cost of placing sensors depends on the location of the placement. We also elucidate that the empirical convergence of our proposed method is faster than random selection and various clustering methods while still respecting sensor placement cost. The paper concludes with validation of the developed algorithm on both synthetic and real datasets, demonstrating that it performs very well compared to the current state of the art.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1906.01021

Country: North America > United States > California (0.69)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.84)

Add feedback

Failures detection at directional drilling using real-time analogues search

Gurina, Ekaterina, Klyuchnikov, Nikita, Zaytsev, Alexey, Romanenkova, Evgenya, Antipova, Ksenia, Simon, Igor, Makarov, Victor, Koroteev, Dmitry

arXiv.org Machine LearningJun-6-2019

One of the main challenges in the construction of oil and gas wells is the need to detect and avoid abnormal situations, which can lead to accidents. Accidents have some indicators that help to find them during the drilling process. In this article, we present a data-driven model trained on historical data from drilling accidents that can detect different types of accidents using real-time signals. The results show that using the time-series comparison, based on aggregated statistics and gradient boosting classification, it is possible to detect an anomaly and identify its type by comparing current measurements while drilling with the stored ones from the database of accidents.

accident, artificial intelligence, upstream oil & gas, (15 more...)

arXiv.org Machine Learning

1906.02667

Country: Europe > Russia (0.28)

Genre: Research Report (0.70)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Security & Privacy (0.93)
(3 more...)

Add feedback

Learning Clustered Representation for Complex Free Energy Landscapes

Zhang, Jun, Lei, Yao-Kun, Che, Xing, Zhang, Zhen, Yang, Yi Isaac, Gao, Yi Qin

arXiv.org Machine LearningJun-6-2019

In this paper we first analyzed the inductive bias underlying the data scattered across complex free energy landscapes (FEL), and exploited it to train deep neural networks which yield reduced and clustered representation for the FEL. Our parametric method, called Information Distilling of Metastability (IDM), is end-to-end differentiable thus scalable to ultra-large dataset. IDM is also a clustering algorithm and is able to cluster the samples in the meantime of reducing the dimensions. Besides, as an unsupervised learning method, IDM differs from many existing dimensionality reduction and clustering methods in that it neither requires a cherry-picked distance metric nor the ground-true number of clusters, and that it can be used to unroll and zoom-in the hierarchical FEL with respect to different timescales. Through multiple experiments, we show that IDM can achieve physically meaningful representations which partition the FEL into well-defined metastable states hence are amenable for downstream tasks such as mechanism analysis and kinetic modeling.

artificial intelligence, machine learning, metastable state, (20 more...)

arXiv.org Machine Learning

1906.02852

Country:

Asia > China (0.30)
North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Unsupervised learning and its role in the knowledge discovery process

#artificialintelligenceJun-5-2019, 04:21:17 GMT

Unlike supervised learning, unsupervised learning not working with labeled data, it is not showing the machine the correct answer. Instead, it is using different algorithms to let the machine create connections by studying and observing the data. Learning and improving by trial and error is the key to unsupervised learning. However, the Knowledge Discovery process is the field of data mining is concerned with the development of methods, techniques and algorithm which can make sense of the available data. It is useful in finding trends, patterns, correlations and anomalies in the databases which is helpful to make accurate decisions for the future.

Add feedback

Bayesian Hierarchical Mixture Clustering using Multilevel Hierarchical Dirichlet Processes

Huang, Weipeng, Laitonjam, Nishma, Piao, Guangyuan, Hurley, Neil

arXiv.org Artificial IntelligenceJun-5-2019

This paper focuses on the problem of hierarchical non-overlapping clustering of a dataset. In such a clustering, each data item is associated with exactly one leaf node and each internal node is associated with all the data items stored in the sub-tree beneath it, so that each level of the hierarchy corresponds to a partition of the dataset. We develop a novel Bayesian nonparametric method combining the nested Chinese Restaurant Process (nCRP) and the Hierarchical Dirichlet Process (HDP). Compared with other existing Bayesian approaches, our solution tackles data with complex latent mixture features which has not been previously explored in the literature. We discuss the details of the model and the inference procedure. Furthermore, experiments on three datasets show that our method achieves solid empirical results in comparison with existing algorithms.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1905.05022

Country:

Asia > Middle East > Jordan (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > Massachusetts (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Consumer Products & Services > Restaurants (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)
Information Technology > Data Science > Data Mining (0.94)
(2 more...)

Add feedback

Brain-Network Clustering via Kernel-ARMA Modeling and the Grassmannian

Ye, Cong, Slavakis, Konstantinos, Patil, Pratik V., Muldoon, Sarah F., Medaglia, John

arXiv.org Machine LearningJun-5-2019

Recent advances in neuroscience and in the technology of functional magnetic resonance imaging (fMRI) and electro-encephalography (EEG) have propelled a growing interest in brain-network clustering via time-series analysis. Notwithstanding, most of the brain-network clustering methods revolve around state clustering and/or node clustering (a.k.a. community detection or topology inference) within states. This work answers first the need of capturing non-linear nodal dependencies by bringing forth a novel feature-extraction mechanism via kernel autoregressive-moving-average modeling. The extracted features are mapped to the Grassmann manifold (Grassmannian), which consists of all linear subspaces of a fixed rank. By virtue of the Riemannian geometry of the Grassmannian, a unifying clustering framework is offered to tackle all possible clustering problems in a network: Cluster multiple states, detect communities within states, and even identify/track subnetwork state sequences. The effectiveness of the proposed approach is underlined by extensive numerical tests on synthetic and real fMRI/EEG data which demonstrate that the advocated learning method compares favorably versus several state-of-the-art clustering schemes.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Machine Learning

1906.02292

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

A numerical measure of the instability of Mapper-type algorithms

Belchí, Francisco, Brodzki, Jacek, Burfitt, Matthew, Niranjan, Mahesan

arXiv.org Machine LearningJun-4-2019

Mapper is an unsupervised machine learning algorithm generalising the notion of clustering to obtain a geometric description of a dataset. The procedure splits the data into possibly overlapping bins which are then clustered. The output of the algorithm is a graph where nodes represent clusters and edges represent the sharing of data points between two clusters. However, several parameters must be selected before applying Mapper and the resulting graph may vary dramatically with the choice of parameters. We define an intrinsic notion of Mapper instability that measures the variability of the output as a function of the choice of parameters required to construct a Mapper output. Our results and discussion are general and apply to all Mapper-type algorithms. We derive theoretical results that provide estimates for the instability and suggest practical ways to control it. We provide also experiments to illustrate our results and in particular we demonstrate that a reliable candidate Mapper output can be identified as a local minimum of instability regarded as a function of Mapper input parameters.

artificial intelligence, instability, machine learning, (16 more...)

arXiv.org Machine Learning

1906.01507

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Unsupervised Temporal Clustering to Monitor the Performance of Alternative Fueling Infrastructure

Ramea, Kalai

arXiv.org Machine LearningJun-4-2019

Zero Emission Vehicles (ZEV) play an important role in the decarbonization of the transportation sector. For a wider adoption of ZEVs, providing a reliable infrastructure is critical. We present a machine learning approach that uses unsupervised temporal clustering algorithm along with survey analysis to determine infrastructure performance and reliability of alternative fuels. We illustrate this approach for the hydrogen fueling stations in California, but this can be generalized for other regions and fuels.

artificial intelligence, machine learning, unsupervised temporal clustering, (16 more...)

arXiv.org Machine Learning

1906.03077

Country: North America > United States > California (0.95)

Genre:

Questionnaire & Opinion Survey (0.70)
Research Report (0.50)

Industry:

Transportation > Ground > Road (1.00)
Energy > Renewable > Hydrogen (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Attributed Graph Clustering via Adaptive Graph Convolution

Zhang, Xiaotong, Liu, Han, Li, Qimai, Wu, Xiao-Ming

arXiv.org Artificial IntelligenceJun-4-2019

Attributed graph clustering is challenging as it requires joint modelling of graph structures and node attributes. Recent progress on graph convolutional networks has proved that graph convolution is effective in combining structural and content information, and several recent methods based on it have achieved promising clustering performance on some real attributed networks. However, there is limited understanding of how graph convolution affects clustering performance and how to properly use it to optimize performance for different graphs. Existing methods essentially use graph convolution of a fixed and low order that only takes into account neighbours within a few hops of each node, which underutilizes node relations and ignores the diversity of graphs. In this paper, we propose an adaptive graph convolution method for attributed graph clustering that exploits high-order graph convolution to capture global cluster structure and adaptively selects the appropriate order for different graphs. We establish the validity of our method by theoretical analysis and extensive experiments on benchmark datasets. Empirical results show that our method compares favourably with state-of-the-art methods.

artificial intelligence, graph convolution, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1906.0121

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback