AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Reference-Based Sequence Classification

He, Zengyou, Xu, Guangyao, Sheng, Chaohua, Xu, Bo, Zou, Quan

arXiv.org Machine LearningMay-17-2019

Sequence classification is an important data mining task in many real world applications. Over the past few decades, many sequence classification methods have been proposed from different aspects. In particular, the pattern-based method is one of the most important and widely studied sequence classification methods in the literature. In this paper, we present a reference-based sequence classification framework, which can unify existing pattern-based sequence classification methods under the same umbrella. More importantly, this framework can be used as a general platform for developing new sequence classification algorithms. By utilizing this framework as a tool, we propose new sequence classification algorithms that are quite different from existing solutions. Experimental results show that new methods developed under the proposed framework are capable of achieving comparable classification accuracy to those state-of-the-art sequence classification algorithms.

artificial intelligence, machine learning, pattern recognition, (20 more...)

arXiv.org Machine Learning

1905.07188

Country:

Asia > China > Liaoning Province > Dalian (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Sampling Clustering

Tarn, Ching, Zhang, Yinan, Feng, Ye

arXiv.org Artificial IntelligenceMay-16-2019

We propose an efficient linear-time graph-based divisive cluster analysis approach called Sampling Clustering. It constructs a lite informative dendrogram by recursively dividing a graph into subgraphs. In each recursive call, a graph is sampled first with a set of vertices being removed to disconnect latent clusters, then condensed by adding edges to the remaining vertices to avoid graph fragmentation caused by vertex removals. We also present some sampling and condensing methods and discuss the effectiveness in this paper. Our implementations run in linear time and achieve outstanding performance on various types of datasets. Experimental results show that they outperform state-of-the-art clustering algorithms with significantly less computing resource requirements.

data mining, machine learning, vertex, (19 more...)

arXiv.org Artificial Intelligence

1806.08245

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report (0.70)

Industry:

Education (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Adaptation of Multivariate Concept to Multi-Way Agglomerative Clustering for Hierarchical Aspect Aggregation

Malepathirana, Tamasha (University of Moratuwa) | Perera, Rashindrie (University of Moratuwa) | Abeysinghe, Yasasi (University of Moratuwa) | Albar, Yumna (University of Moratuwa) | Thayasivam, Uthayasanker (University of Moratuwa)

AAAI ConferencesMay-15-2019

Hierarchical review aspect aggregation is an important challenge in review summarization. Currently, agglomerative clustering is widely used for hierarchical aspect aggregation. We identify an important but less studied issue in using agglomerative clustering for the aforementioned task. This paper proposes a novel approach to generate a multi-way hierarchy by adaptation of the multivariate concept. Furthermore, we propose a novel experimentation approach to evaluate the acceptability of the aspect relations obtained from the hierarchy generated.

algorithm, aspect aggregation, hierarchy, (14 more...)

AAAI Conferences

The Thirty-Second International Flairs Conference

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Sri Lanka (0.04)
Asia > India (0.04)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Gene Selection and Clustering of Breast Cancer Data

Bhuiyan, Farzana Ahamed (Tennessee Technological University) | Sharif, MD Bulbul (Tennessee Technological University) | Tinker, Paul Joshua (Tennessee Technological University) | Eberle, William (Tennessee Technological University) | Talbert, Douglas A. (Tennessee Technological University) | Ghafoor, Sheikh Khaled (Tennessee Technological University) | Frey, Lewis (Medical University of South Carolina)

AAAI ConferencesMay-15-2019

In this work, we first attempt to replicate an earlier study on gene selection and clustering, and then we extend this work by applying a different type of hierarchical clustering to dis- cover interesting subsets of genes from breast cancer data. Replication of such studies is a known challenge and an ac- tive area of research in bioinformatics. The work presented in this paper is three-fold. First, we replicate a study conducted at the University of North Carolina to generate an initial set of genes. Second, we apply an approach called Distance Weighted Discrimination to fuse multiple, disparate breast cancer datasets into a single validation set. Third, we per- form hierarchical clustering and k-means clustering on this validation set to discover natural groupings and compare the clusters generated by both methods. While applying the hi- erarchical clustering is part of the reproduction step, we ex- tend the research by trying two different forms of hierarchi- cal clustering. We also apply k-means clustering for the same purpose and compare all three methods using Kaplan-Meier estimation and Cox proportional hazards regression. We dis- cover that among the three methods, k-means clustering gives us the best results.

clustering, dataset, hierarchical clustering, (13 more...)

AAAI Conferences

The Thirty-Second International Flairs Conference

Country:

North America > United States > North Carolina (0.24)
North America > United States > South Carolina > Charleston County > Charleston (0.14)
Asia > Singapore (0.05)
(2 more...)

Genre:

Research Report > Experimental Study (0.90)
Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Spectral Clustering of Signed Graphs via Matrix Power Means

Mercado, Pedro, Tudisco, Francesco, Hein, Matthias

arXiv.org Machine LearningMay-15-2019

Signed graphs encode positive (attractive) and negative (repulsive) relations between nodes. We extend spectral clustering to signed graphs via the one-parameter family of Signed Power Mean Laplacians, defined as the matrix power mean of normalized standard and signless Laplacians of positive and negative edges. We provide a thorough analysis of the proposed approach in the setting of a general Stochastic Block Model that includes models such as the Labeled Stochastic Block Model and the Censored Block Model. We show that in expectation the signed power mean Laplacian captures the ground truth clusters under reasonable settings where state-of-the-art approaches fail. Moreover, we prove that the eigenvalues and eigenvector of the signed power mean Laplacian concentrate around their expectation under reasonable conditions in the general Stochastic Block Model. Extensive experiments on random graphs and real world datasets confirm the theoretically predicted behaviour of the signed power mean Laplacian and show that it compares favourably with state-of-the-art methods.

artificial intelligence, machine learning, social media, (16 more...)

arXiv.org Machine Learning

1905.0623

Country: North America > United States > California (0.27)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Communications > Social Media (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.66)

Add feedback

EasiCS: the objective and fine-grained classification method of cervical spondylosis dysfunction

Wang, Nana, Cui, Li, Huang, Xi, Xiang, Yingcong, Xiao, Jing, Rao, Yi

arXiv.org Machine LearningMay-15-2019

In order to achieve it, we proposed and developed the classification framework EasiCS to obtain the relative stability The cervical spondylosis(CS), a common degenerative clustering results, which consists of dimension reduction, disease, harms human life and health, affects up clustering algorithm EasiSOM, spectral clustering algorithm to two-thirds of the population, and poses an serious EasiSC as shown in the Figure 1. To the best of our burden on individuals and society (Matz et al. 2009; knowledge, the EasiCS is the first effort to utilize the clustering Kotil and Bilge 2008; Cai et al. 2016; Nana Wang; algorithm and sEMG. Compared with the seven commonly Wang et al. 2018). Currently, the neck disability index used clustering algorithms, the novelty framework (Howard Vernon) is the most commonly used tool EasiCS provide the best overall performance. The cervical to assess the neck dysfunction (Vernon and Mior 1991), spondylosis(CS), a common degenerative disease, harms human The availability of which are mainly undermined by the life and health, affects up to two-thirds of the population, coarse-grained and unreasonable classification, despite that and poses an serious burden on individuals and society the NDI information is subjective and not accurate enough.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1905.05987

Country: Asia > China (0.16)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

A self-organising eigenspace map for time series clustering

Rahmani, Donya, Fay, Damien, Brodzki, Jacek

arXiv.org Machine LearningMay-14-2019

This paper presents a novel time series clustering method, the self-organising eigenspace map (SOEM), based on a generalisation of the well-known self-organising feature map (SOFM). The SOEM operates on the eigenspaces of the embedded covariance structures of time series which are related directly to modes in those time series. Approximate joint diagonalisation acts as a pseudo-metric across these spaces allowing us to generalise the SOFM to a neural network with matrix input. The technique is empirically validated against three sets of experiments; univariate and multivariate time series clustering, and application to (clustered) multi-variate time series forecasting. Results indicate that the technique performs a valid topologically ordered clustering of the time series. The clustering is superior in comparison to standard benchmarks when the data is non-aligned, gives the best clustering stage for when used in forecasting, and can be used with partial/non-overlapping time series, multivariate clustering and produces a topological representation of the time series objects.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1905.0554

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Multi-View Multiple Clustering

Yao, Shixing, Yu, Guoxian, Wang, Jun, Domeniconi, Carlotta, Zhang, Xiangliang

arXiv.org Machine LearningMay-13-2019

Multiple clustering aims at exploring alternative clusterings to organize the data into meaningful groups from different perspectives. Existing multiple clustering algorithms are designed for single-view data. We assume that the individuality and commonality of multi-view data can be leveraged to generate high-quality and diverse clusterings. To this end, we propose a novel multi-view multiple clustering (MVMC) algorithm. MVMC first adapts multi-view self-representation learning to explore the individuality encoding matrices and the shared commonality matrix of multi-view data. It additionally reduces the redundancy (i.e., enhancing the individuality) among the matrices using the Hilbert-Schmidt Independence Criterion (HSIC), and collects shared information by forcing the shared matrix to be smooth across all views. It then uses matrix factorization on the individual matrices, along with the shared matrix, to generate diverse clusterings of high-quality. We further extend multiple co-clustering on multi-view data and propose a solution called multi-view multiple co-clustering (MVMCC). Our empirical study shows that MVMC (MVMCC) can exploit multi-view data to generate multiple high-quality and diverse clusterings (co-clusterings), with superior performance to the state-of-the-art methods.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1905.05053

Country: Asia (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)
Information Technology > Data Science > Data Mining (0.89)

Add feedback

Determining Number of Clusters in One Picture

#artificialintelligenceMay-12-2019, 01:27:55 GMT

If you want to determine the optimal number of clusters in your analysis, you're faced with an overwhelming number of (mostly subjective) choices. Note that there's no "best" method, no "correct" k, and there isn't even a consensus as to the definition of what a "cluster" is. With that said, this picture focuses on three popular methods that should fit almost every need: Silhouette, Elbow, and Gap Statistic.

artificial intelligence, machine learning

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.74)

Add feedback

Explainable AI for Trees: From Local Explanations to Global Understanding

Lundberg, Scott M., Erion, Gabriel, Chen, Hugh, DeGrave, Alex, Prutkin, Jordan M., Nair, Bala, Katz, Ronit, Himmelfarb, Jonathan, Bansal, Nisha, Lee, Su-In

arXiv.org Machine LearningMay-11-2019

Tree-based machine learning models such as random forests, decision trees, and gradient boosted trees are the most popular non-linear predictive models used in practice today, yet comparatively little attention has been paid to explaining their predictions. Here we significantly improve the interpretability of tree-based models through three main contributions: 1) The first polynomial time algorithm to compute optimal explanations based on game theory. 2) A new type of explanation that directly measures local feature interaction effects. 3) A new set of tools for understanding global model structure based on combining many local explanations of each prediction. We apply these tools to three medical machine learning problems and show how combining many high-quality local explanations allows us to represent global structure while retaining local faithfulness to the original model. These tools enable us to i) identify high magnitude but low frequency non-linear mortality risk factors in the general US population, ii) highlight distinct population sub-groups with shared risk characteristics, iii) identify non-linear interaction effects among risk factors for chronic kidney disease, and iv) monitor a machine learning model deployed in a hospital by identifying which features are degrading the model's performance over time. Given the popularity of tree-based machine learning models, these improvements to their interpretability have implications across a broad set of domains.

artificial intelligence, decision tree learning, machine learning, (19 more...)

arXiv.org Machine Learning

1905.0461

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Education (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback