AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"

Schubert, Erich, Zimek, Arthur

arXiv.org Machine LearningFeb-10-2019

This paper documents the release of the ELKI data mining framework, version 0.7.5. ELKI is an open source (AGPLv3) data mining software written in Java. The focus of ELKI is research in algorithms, with an emphasis on unsupervised methods in cluster analysis and outlier detection. In order to achieve high performance and scalability, ELKI offers data index structures such as the R*-tree that can provide major performance gains. ELKI is designed to be easy to extend for researchers and students in this domain, and welcomes contributions of additional methods. ELKI aims at providing a large collection of highly parameterizable algorithms, in order to allow easy and fair evaluation and benchmarking of algorithms. We will first outline the motivation for this release, the plans for the future, and then give a brief overview over the new functionality in this version. We also include an appendix presenting an overview on the overall implemented functionality.

density-based clustering, survey article, upstream oil & gas, (22 more...)

arXiv.org Machine Learning

1902.03616

Country:

North America > United States > Wisconsin (0.14)
North America > United States > Virginia (0.13)
North America > United States > New York (0.13)
(3 more...)

Genre: Research Report > Experimental Study (0.45)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(4 more...)

Add feedback

Spectral-Spatial Diffusion Geometry for Hyperspectral Image Clustering

Murphy, James M., Maggioni, Mauro

arXiv.org Machine LearningFeb-8-2019

An unsupervised learning algorithm to cluster hyperspectral image (HSI) data is proposed that exploits spatially-regularized random walks. Markov diffusions are defined on the space of HSI spectra with transitions constrained to near spatial neighbors. The explicit incorporation of spatial regularity into the diffusion construction leads to smoother random processes that are more adapted for unsupervised machine learning than those based on spectra alone. The regularized diffusion process is subsequently used to embed the high-dimensional HSI into a lower dimensional space through diffusion distances. Cluster modes are computed using density estimation and diffusion distances, and all other points are labeled according to these modes. The proposed method has low computational complexity and performs competitively against state-of-the-art HSI clustering algorithms on real data. In particular, the proposed spatial regularization confers an empirical advantage over non-regularized methods.

algorithm, dataset, neighbor, (14 more...)

arXiv.org Machine Learning

1902.05402

Country:

North America > United States > Florida > Brevard County (0.06)
Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Government (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Improving Deep Image Clustering With Spatial Transformer Layers

Souza, Thiago V. M., Zanchettin, Cleber

arXiv.org Machine LearningFeb-8-2019

Deep image clustering is a recent research area, but with exciting published works [15]. The approaches use the most diverse architectures varying the structure of the deep networks, theclustering algorithms and the combination of both parts. Approachessuch as the Deep Clustering Network (DCN) [9] use a pretrained autoencoder combined with the k-means algorithm. Methods such as Joint Unsupervised Learning (JULE) [10] combines deep convolutional networks with hierarchical clustering. Deep Embbed Cluster (DEC) [11], also uses a pretrained autoencoder, then removes the decoder part and uses the encoder as a feature extractor to feed the clustering method. After that, the network is fine-tuned using the cluster assignment hardening loss. Meanwhile, the clusters are iteratively tuned by minimizing the KL-divergence between the distribution of soft labels and the auxiliary target distribution.

experiment, st layer, transformation, (16 more...)

arXiv.org Machine Learning

1902.05401

Country:

South America > Brazil > Pernambuco > Recife (0.04)
North America > United States (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bounded Fuzzy Possibilistic Method

Yazdani, Hossein

arXiv.org Machine LearningFeb-8-2019

This paper introduces Bounded Fuzzy Possibilistic Method (BFPM) by addressing several issues that previous clustering/classification methods have not considered. In fuzzy clustering, object's membership values should sum to 1. Hence, any object may obtain full membership in at most one cluster. Possibilistic clustering methods remove this restriction. However, BFPM differs from previous fuzzy and possibilistic clustering approaches by allowing the membership function to take larger values with respect to all clusters. Furthermore, in BFPM, a data object can have full membership in multiple clusters or even in all clusters. BFPM relaxes the boundary conditions (restrictions) in membership assignment. The proposed methodology satisfies the necessity of obtaining full memberships and overcomes the issues with conventional methods on dealing with overlapping. Analysing the objects' movements from their own cluster to another (mutation) is also proposed in this paper. BFPM has been applied in different domains in geometry, set theory, anomaly detection, risk management, diagnosis diseases, and other disciplines. Validity and comparison indexes have been also used to evaluate the accuracy of BFPM. BFPM has been evaluated in terms of accuracy, fuzzification constant (different norms), objects' movement analysis, and covering diversity. The promising results prove the importance of considering the proposed methodology in learning methods to track the behaviour of data objects, in addition to obtain accurate results.

bfpm, membership assignment, transaction, (13 more...)

arXiv.org Machine Learning

1902.03127

Country:

Europe > Poland > Lower Silesia Province > Wroclaw (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Texas (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Oncology (0.68)
Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Online Clustering by Penalized Weighted GMM

Bugdary, Shlomo, Maymon, Shay

arXiv.org Machine LearningFeb-7-2019

With the dawn of the Big Data era, data sets are growing rapidly. Data is streaming from everywhere - from cameras, mobile phones, cars, and other electronic devices. Clustering streaming data is a very challenging problem. Unlike the traditional clustering algorithms where the dataset can be stored and scanned multiple times, clustering streaming data has to satisfy constraints such as limit memory size, real-time response, unknown data statistics and an unknown number of clusters. In this paper, we present a novel online clustering algorithm which can be used to cluster streaming data without knowing the number of clusters a priori. Results on both synthetic and real datasets show that the proposed algorithm produces partitions which are close to what you could get if you clustered the whole data at one time.

algorithm, covariance matrix, dataset, (16 more...)

arXiv.org Machine Learning

1902.02544

Country: North America > United States > Massachusetts > Plymouth County > Norwell (0.04)

Genre: Research Report (0.40)

Industry: Semiconductors & Electronics (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Nearly Optimal Dynamic $k$-Means Clustering for High-Dimensional Data

Hu, Wei, Song, Zhao, Yang, Lin F., Zhong, Peilin

arXiv.org Machine LearningFeb-7-2019

We consider the $k$-means clustering problem in the dynamic streaming setting, where points from a discrete Euclidean space $\{1, 2, \ldots, \Delta\}^d$ can be dynamically inserted to or deleted from the dataset. For this problem, we provide a one-pass coreset construction algorithm using space $\tilde{O}(k\cdot \mathrm{poly}(d, \log\Delta))$, where $k$ is the target number of centers. To our knowledge, this is the first dynamic geometric data stream algorithm for $k$-means using space polynomial in dimension and nearly optimal (linear) in $k$.

algorithm, optimal dynamic k-means clustering, probability, (11 more...)

arXiv.org Machine Learning

1802.00459

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)

Add feedback

Automating Interpretability: Discovering and Testing Visual Concepts Learned by Neural Networks

Ghorbani, Amirata, Wexler, James, Kim, Been

arXiv.org Machine LearningFeb-6-2019

Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions. Due to it's complexity, i For high-stakes domains such as medical, providing intuitive explanations that can be consumed by domain experts without ML expertise becomes crucial. To this demand, concept-based methods (e.g., TCAV) were introduced to provide explanations using user-chosen high-level concepts rather than individual input features. While these methods successfully leverage rich representations learned by the networks to reveal how human-defined concepts are related to the prediction, they require users to select concepts of their choice and collect labeled examples of those concepts. In this work, we introduce DTCAV (Discovery TCAV) a global concept-based interpretability method that can automatically discover concepts as image segments, along with each concept's estimated importance for a deep neural network's predictions. We validate that discovered concepts are as coherent to humans as hand-labeled concepts. We also show that the discovered concepts carry significant signal for prediction by analyzing a network's performance with stitched/added/deleted concepts. DTCAV results revealed a number of undesirable correlations (e.g., a basketball player's jersey was a more important concept for predicting the basketball class than the ball itself) and show the potential shallow reasoning of these networks.

experiment, prediction, tcav score 0, (12 more...)

arXiv.org Machine Learning

1902.03129

Country:

Europe (0.48)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Sports (0.48)
Law (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

An Automated Spectral Clustering for Multi-scale Data

Afzalan, Milad, Jazizadeh, Farrokh

arXiv.org Machine LearningFeb-5-2019

Spectral clustering algorithms typically require a priori selection of input parameters such as the number of clusters, a scaling parameter for the affinity measure, or ranges of these values for parameter tuning. Despite efforts for automating the process of spectral clustering, the task of grouping data in multi-scale and higher dimensional spaces is yet to be explored. This study presents a spectral clustering heuristic algorithm that obviates the need for an input by estimating the parameters from the data itself. Specifically, it introduces the heuristic of iterative eigengap search with (1) global scaling and (2) local scaling. These approaches estimate the scaling parameter and implement iterative eigengap quantification along a search tree to reveal dissimilarities at different scales of a feature space and identify clusters. The performance of these approaches has been tested on various real-world datasets of power variation with multi-scale nature and gene expression. Our findings show that iterative eigengap search with a PCA-based global scaling scheme can discover different patterns with an accuracy of higher than 90% in most cases without asking for a priori input information.

algorithm, dataset, spectral, (12 more...)

arXiv.org Machine Learning

1902.0199

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Virginia > Montgomery County > Blacksburg (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report > New Finding (0.86)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.88)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Hyperbox based machine learning algorithms: A comprehensive survey

Khuat, Thanh Tung, Ruta, Dymitr, Gabrys, Bogdan

arXiv.org Machine LearningFeb-4-2019

With the rapid development of digital information, the data volume generated by humans and machines is growing exponentially. Along with this trend, machine learning algorithms have been formed and evolved continuously to discover new information and knowledge from different data sources. Learning algorithms using hyperboxes as fundamental representational and building blocks are a branch of machine learning methods. These algorithms have enormous potential for high scalability and online adaptation of predictors built using hyperbox data representations to the dynamically changing environments and streaming data. This paper aims to give a comprehensive survey of literature on hyperbox-based machine learning models. In general, according to the architecture and characteristic features of the resulting models, the existing hyperbox-based learning algorithms may be grouped into three major categories: fuzzy min-max neural networks, hyperbox-based hybrid models, and other algorithms based on hyperbox representation. Within each of these groups, this paper shows a brief description of the structure of models, associated learning algorithms, and an analysis of their advantages and drawbacks. Main applications of these hyperbox-based models to the real-world problems are also described in this paper. Finally, we discuss some open problems and identify potential future research directions in this field.

hyperboxe, midstream oil & gas, vascular disease, (28 more...)

arXiv.org Machine Learning

1901.11303

Country:

Europe (0.27)
Asia > Malaysia (0.14)
Asia > Vietnam (0.14)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
(6 more...)

Add feedback

Spaces of Clusterings

Rolle, Alexander, Scoccola, Luis

arXiv.org Machine LearningFeb-4-2019

Often, a clustering algorithm, rather than producing a single clustering of a dataset, produces a set of clusterings. For example, one gets a set of clusterings by running a clustering algorithm with a range of parameters, or with many initializations. Given a set S of clusterings of a dataset X, one may want to know how many different kinds of clusterings the set S contains, ignoring small differences between elements of S. In effect, one may want to cluster S. This paper proposes two clustering algorithms, specifically for use on sets of clusterings of a fixed dataset. The starting point is the observation that sets of clusterings have geometric structure.Indeed, there are many ways, described in the literature, to define a metric on the set of all clusterings of a fixed dataset, and it is a natural idea to use such metrics to cluster a set of clusterings.

algorithm, algorithm 1, dataset, (11 more...)

arXiv.org Machine Learning

1902.01436

Country:

North America > United States > New York (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback