AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Outcome-Driven Clustering of Acute Coronary Syndrome Patients using Multi-Task Neural Network with Attention

Xia, Eryu, Du, Xin, Mei, Jing, Sun, Wen, Tong, Suijun, Kang, Zhiqing, Sheng, Jian, Li, Jian, Ma, Changsheng, Dong, Jianzeng, Li, Shaochun

arXiv.org Machine LearningMar-27-2019

Cluster analysis aims at separating patients into phenotypically heterogenous groups and defining therapeutically homogeneous patient subclasses. It is an important approach in data-driven disease classification and subtyping. Acute coronary syndrome (ACS) is a syndrome due to sudden decrease of coronary artery blood flow, where disease classification would help to inform therapeutic strategies and provide prognostic insights. Here we conducted outcome-driven cluster analysis of ACS patients, which jointly considers treatment and patient outcome as indicators for patient state. Multi-task neural network with attention was used as a modeling framework, including learning of the patient state, cluster analysis, and feature importance profiling. Seven patient clusters were discovered. The clusters have different characteristics, as well as different risk profiles to the outcome of in-hospital major adverse cardiac events. The results demonstrate cluster analysis using outcome-driven multi-task neural network as promising for patient classification and subtyping.

artificial intelligence, machine learning, neural network, (15 more...)

arXiv.org Machine Learning

1903.00197

Country: Asia > China > Henan Province (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.96)
Health & Medicine > Diagnostic Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Ground Profile Recovery from Aerial 3D LiDAR-based Maps

Sabirova, Adelya, Rassabin, Maksim, Fedorenko, Roman, Afanasyev, Ilya

arXiv.org Artificial IntelligenceMar-26-2019

The paper presents the study and implementation of the ground detection methodology with filtration and removal of forest points from LiDAR-based 3D point cloud using the Cloth Simulation Filtering (CSF) algorithm. The methodology allows to recover a terrestrial relief and create a landscape map of a forestry region. As the proof-of-concept, we provided the outdoor flight experiment, launching a hexacopter under a mixed forestry region with sharp ground changes nearby Innopolis city (Russia), which demonstrated the encouraging results for both ground detection and methodology robustness.

artificial intelligence, machine learning, point cloud, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.23919/FRUCT.2019.8711928

1903.11097

Country:

Asia > Russia (0.25)
Europe > Russia > Volga Federal District > Republic of Tatarstan (0.15)
Asia > China (0.04)

Genre: Research Report (0.64)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Software (0.93)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.68)
(2 more...)

Add feedback

Explaining individual predictions when features are dependent: More accurate approximations to Shapley values

Aas, Kjersti, Jullum, Martin, Løland, Anders

arXiv.org Machine LearningMar-25-2019

Explaining complex or seemingly simple machine learning models is a practical and ethical question, as well as a legal issue. Can I trust the model? Is it biased? Can I explain it to others? We want to explain individual predictions from a complex machine learning model by learning simple, interpretable explanations. Of existing work on interpreting complex models, Shapley values is the only method with a solid theoretical foundation. Kernel SHAP is a computationally efficient approximation to Shapley values in higher dimensions. Like most other existing methods, this approach assumes independent features, which may give very wrong explanations. This is the case even if a simple linear model is used for predictions. We extend the Kernel SHAP method to handle dependent features. We provide several examples of linear and non-linear models with linear and non-linear feature dependence, where our method gives more accurate approximations to the true Shapley values. We also propose a method for aggregating individual Shapley values, such that the prediction can be explained by groups of dependent variables.

artificial intelligence, machine learning, shapley value, (17 more...)

arXiv.org Machine Learning

1903.10464

Country:

Europe (0.46)
Asia (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.92)
Banking & Finance (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Strongly Consistent Sparse $k$-means Clustering with Direct $l_1$ Penalization on Variable Weights

Chakraborty, Saptarshi, Das, Swagatam

arXiv.org Machine LearningMar-24-2019

We propose the Lasso Weighted $k$-means ($LW$-$k$-means) algorithm as a simple yet efficient sparse clustering procedure for high-dimensional data where the number of features ($p$) can be much larger compared to the number of observations ($n$). In the $LW$-$k$-means algorithm, we introduce a lasso-based penalty term, directly on the feature weights to incorporate feature selection in the framework of sparse clustering. $LW$-$k$-means does not make any distributional assumption of the given dataset and thus, induces a non-parametric method for feature selection. We also analytically investigate the convergence of the underlying optimization procedure in $LW$-$k$-means and establish the strong consistency of our algorithm. $LW$-$k$-means is tested on several real-life and synthetic datasets and through detailed experimental analysis, we find that the performance of the method is highly competitive against some state-of-the-art procedures for clustering and feature selection, not only in terms of clustering accuracy but also with respect to computational time.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1903.10039

Country: North America > United States (0.92)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Latent Simplex Position Model: High Dimensional Multi-view Clustering with Uncertainty Quantification

Duan, Leo L

arXiv.org Machine LearningMar-21-2019

High dimensional data often contain multiple facets, and several clustering patterns (views) can co-exist under different feature subspaces. While multi-view clustering algorithms were proposed, the uncertainty quantification remains difficult --- a particular challenge is in the high complexity of estimating the cluster assignment probability under each view, or/and to efficiently share information across views. In this article, we propose an empirical Bayes approach --- viewing the similarity matrices generated over subspaces as rough first-stage estimates for co-assignment probabilities, in its Kullback-Leibler neighborhood we obtain a refined low-rank soft cluster graph, formed by the pairwise product of simplex coordinates. Interestingly, each simplex coordinate directly encodes the cluster assignment uncertainty. For multi-view clustering, we equip each similarity matrix with a mixed membership over a small number of latent views, leading to effective dimension reduction. With a high model flexibility, the estimation can be succinctly re-parameterized as a continuous optimization problem, hence enjoys gradient-based computation. Theory establishes the connection of this model to random cluster graph under multiple views. Compared to single-view clustering approaches, substantially more interpretable results are obtained when clustering brains from human traumatic brain injury study, using high-dimensional gene expression data. KEY WORDS: Co-regularized Clustering, Consensus, PAC-Bayes, Random Cluster Graph, Variable Selection

artificial intelligence, machine learning, matrix, (15 more...)

arXiv.org Machine Learning

1903.09029

Country: North America (0.46)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Ensemble Clustering for Graphs: Comparisons and Applications

Poulin, Valérie, Théberge, François

arXiv.org Machine LearningMar-19-2019

We recently proposed a new ensemble clustering algorithm for graphs (ECG) based on the concept of consensus clustering. We validated our approach by replicating a study comparing graph clustering algorithms over benchmark graphs, showing that ECG outperforms the leading algorithms. In this paper, we extend our comparison by considering a wider range of parameters for the benchmark, generating graphs with different properties. We provide new experimental results showing that the ECG algorithm alleviates the well-known resolution limit issue, and that it leads to better stability of the partitions. We also illustrate how the ensemble obtained with ECG can be used to quantify the presence of community structure in the graph, and to zoom in on the sub-graph most closely associated with seed vertices. Finally, we illustrate further applications of ECG by comparing it to previous results for community detection on weighted graphs, and community-aware anomaly detection.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1903.08012

Country: North America > Canada > Ontario > National Capital Region > Ottawa (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.78)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.54)

Add feedback

Spherical Principal Component Analysis

Liu, Kai, Li, Qiuwei, Wang, Hua, Tang, Gongguo

arXiv.org Machine LearningMar-16-2019

In many real-world applications such as text categorization and face recognition, the dimensions of data are usually very high. Dealing with high-dimensional data is computationally expensive while noise or outliers in the data can increase dramatically as the dimension increases. Dimension reduction is one of the most important and effective methods to handle high dimensional data [4, 17, 20]. Among the dimension reduction methods, Principal Component Analysis (PCA) is one of the most widely used methods due to its simplicity and effectiveness. PCA is a statistical procedure that uses an orthogonal transformation to convert a set of correlated variables into a set of linearly uncorrelated principal directions. Usually the number of principal directions is less than or equal to the number of original variables. This transformation is defined in such a way that the first principal direction has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding direction has the highest variance under the constraint that it is orthogonal to the preceding directions. The resulting vectors are an uncorrelated orthogonal basis set. When data points lie in a low-dimensional manifold and the manifold is linear or nearly-linear, the low-dimensional structure of data can be effectively captured by a linear subspace spanned by the principal PCA directions.

artificial intelligence, euclidean distance, machine learning, (17 more...)

arXiv.org Machine Learning

1903.06877

Country: North America > United States > Colorado (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

How to Automatically Determine the Number of Clusters in your Data - and more

#artificialintelligenceMar-15-2019, 22:44:09 GMT

Determining the number of clusters when performing unsupervised clustering is a tricky problem. Many data sets don't exhibit well separated clusters, and two human beings asked to visually tell the number of clusters by looking at a chart, are likely to provide two different answers. Sometimes clusters overlap with each other, and large clusters contain sub-clusters, making a decision not easy. For instance, how many clusters do you see in the picture below? What is the optimum number of clusters?

artificial intelligence, machine learning, strength, (18 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

How to Automatically Determine the Number of Clusters in your Data - and more

#artificialintelligenceMar-15-2019, 22:44:09 GMT

artificial intelligence, machine learning, strength, (18 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Multi-Stage Fault Warning for Large Electric Grids Using Anomaly Detection and Machine Learning

Raja, Sanjeev, Fokoué, Ernest

arXiv.org Machine LearningMar-15-2019

In the monitoring of a complex electric grid, it is of paramount importance to provide operators with early warnings of anomalies detected on the network, along with a precise classification and diagnosis of the specific fault type. In this paper, we propose a novel multi-stage early warning system prototype for electric grid fault detection, classification, subgroup discovery, and visualization. In the first stage, a computationally efficient anomaly detection method based on quartiles detects the presence of a fault in real time. In the second stage, the fault is classified into one of nine pre-defined disaster scenarios. The time series data are first mapped to highly discriminative features by applying dimensionality reduction based on temporal autocorrelation. The features are then mapped through one of three classification techniques: support vector machine, random forest, and artificial neural network. Finally in the third stage, intra-class clustering based on dynamic time warping is used to characterize the fault with further granularity. Results on the Bonneville Power Administration electric grid data show that i) the proposed anomaly detector is both fast and accurate; ii) dimensionality reduction leads to dramatic improvement in classification accuracy and speed; iii) the random forest method offers the most accurate, consistent, and robust fault classification; and iv) time series within a given class naturally separate into five distinct clusters which correspond closely to the geographical distribution of electric grid buses.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1903.067

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.64)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable > Hydroelectric (0.54)
Government > Regional Government > North America Government > United States Government (0.34)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback