AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Efficient Data Analytics on Augmented Similarity Triplets

Ahmad, Muhammad, Shakeel, Muhammad Haroon, Ali, Sarwan, Khan, Imdadullah, Zaman, Arif, Karim, Asim

arXiv.org Machine LearningDec-27-2019

Many machine learning methods (classification, clustering, etc.) start with a known kernel that provides similarity or distance measure between two objects. Recent work has extended this to situations where the information about objects is limited to comparisons of distances between three objects (triplets). Humans find the comparison task much easier than the estimation of absolute similarities, so this kind of data can be easily obtained using crowd-sourcing. In this work, we give an efficient method of augmenting the triplets data, by utilizing additional implicit information inferred from the existing data. Triplets augmentation improves the quality of kernel-based and kernel-free data analytics tasks. Secondly, we also propose a novel set of algorithms for common supervised and unsupervised machine learning tasks based on triplets. These methods work directly with triplets, avoiding kernel evaluations. Experimental evaluation on real and synthetic datasets shows that our methods are more accurate than the current best-known techniques.

algorithm, dataset, triplet, (12 more...)

arXiv.org Machine Learning

1912.12064

Country: Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Parameter Free Clustering with Cluster Catch Digraphs (Technical Report)

Manukyan, Artür, Ceyhan, Elvan

arXiv.org Machine LearningDec-26-2019

Clustering is one of the most challenging tasks in machine learning and pattern recognition, and perhaps, discovering the exact number of clusters of an unlabelled data set is the leading one. Many clustering methods find the clusters (or hidden classes) and the number of these clusters simultaneously (Frey and Dueck, 2007; Sajana et al., 2016). Although there exist methods for validating and comparing the quality of a partitioning of a data set, algorithms that provide the (estimated) number of clusters without any input parameter are still appealing. However, such methods or algorithms rely on other parameters viewed as the intensity, i.e. expected number of objects in a unit area. The value of the intensity parameter works as a threshold, and if the local intensity of the data set exceeds the threshold, it may indicate the existence of a possible cluster. However, the choice of such parameters is often a difficult task since different values of such parameters may drastically change the result of the algorithm. We use unsupervised adaptations of a family of vertex random digraphs, namely class cover catch digraphs (CCCDs), that showed relatively good performance in statistical pattern classification (Manukyan and Ceyhan, 2016; Priebe et al., 2003a). Unsupervised versions of CCCDs are called cluster catch digraphs (CCDs) (DeVinney, 2003; Marchette, 2004). Primarily, CCDs use statistics that require an intensity parameter to be specified or estimated.

algorithm, digraph, rk-ccd, (12 more...)

arXiv.org Machine Learning

1912.11926

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(10 more...)

Genre: Research Report (0.81)

Industry: Health & Medicine (0.92)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Learning with Wasserstein barycenters and applications

Domazakis, G., Drivaliaris, D., Koukoulas, S., Papayiannis, G., Tsekrekos, A., Yannacopoulos, A.

arXiv.org Machine LearningDec-26-2019

In this work, learning schemes for measure-valued data are proposed, i.e. data that their structure can be more efficiently represented as probability measures instead of points on $\R^d$, employing the concept of probability barycenters as defined with respect to the Wasserstein metric. Such type of learning approaches are highly appreciated in many fields where the observational/experimental error is significant (e.g. astronomy, biology, remote sensing, etc.) or the data nature is more complex and the traditional learning algorithms are not applicable or effective to treat them (e.g. network data, interval data, high frequency records, matrix data, etc.). Under this perspective, each observation is identified by an appropriate probability measure and the proposed statistical learning schemes rely on discrimination criteria that utilize the geometric structure of the space of probability measures through core techniques from the optimal transport theory. The discussed approaches are implemented in two real world applications: (a) clustering eurozone countries according to their observed government bond yield curves and (b) classifying the areas of a satellite image to certain land uses categories which is a standard task in remote sensing. In both case studies the results are particularly interesting and meaningful while the accuracy obtained is high.

barycenter, environment type, probability measure, (17 more...)

arXiv.org Machine Learning

1912.11801

Country:

Europe > Austria > Vienna (0.14)
Europe > Portugal (0.04)
Europe > Ireland (0.04)
(9 more...)

Genre: Research Report (0.40)

Industry:

Banking & Finance (0.48)
Government (0.34)
Law (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Evolutionary Clustering via Message Passing

Arzeno, Natalia M., Vikalo, Haris

arXiv.org Artificial IntelligenceDec-26-2019

We are often interested in clustering objects that evolve over time and identifying solutions to the clustering problem for every time step. Evolutionary clustering provides insight into cluster evolution and temporal changes in cluster memberships while enabling performance superior to that achieved by independently clustering data collected at different time points. In this paper we introduce evolutionary affinity propagation (EAP), an evolutionary clustering algorithm that groups data points by exchanging messages on a factor graph. EAP promotes temporal smoothness of the solution to clustering time-evolving data by linking the nodes of the factor graph that are associated with adjacent data snapshots, and introduces consensus nodes to enable cluster tracking and identification of cluster births and deaths. Unlike existing evolutionary clustering methods that require additional processing to approximate the number of clusters or match them across time, EAP determines the number of clusters and tracks them automatically. A comparison with existing methods on simulated and experimental data demonstrates effectiveness of the proposed EAP algorithm.

consensus node, exemplar, time step, (14 more...)

arXiv.org Artificial Intelligence

1912.1197

Country:

North America > Puerto Rico (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > California (0.04)
(19 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Health Care Providers & Services > Reimbursement (1.00)
Health & Medicine > Government Relations & Public Policy (1.00)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Machine Learning Interview Questions And Answers

#artificialintelligenceDec-24-2019, 21:36:23 GMT

Machine learning (ML) is a rising field. It offers many interesting and well-paid jobs and opportunities. Each of these and some other items might be touched in an ML interview. There is a large number of possible questions and topics. This article presents 12 general questions (with the brief answers) appropriate mainly for beginners and intermediates.

learning interview question and answer, machine learning interview question, probability, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.32)

Add feedback

Self-adaption grey DBSCAN clustering

Lu, Shizhan

arXiv.org Machine LearningDec-23-2019

Clustering analysis, a classical issue in data mining, is widely used in various research areas. This article aims at proposing a self-adaption grey DBSCAN clustering (SAG-DBSCAN) algorithm. First, the grey relational matrix is used to obtain the grey local density indicator, and then this indicator is applied to make self-adapting noise identification for obtaining a dense subset of clustering dataset, finally, the DBSCAN which automatically selects parameters is utilized to cluster the dense subset. Several frequently-used datasets were used to demonstrate the performance and effectiveness of the proposed clustering algorithm and to compare the results with those of other state-of-the-art algorithms. The comprehensive comparisons indicate that our method has advantages over other compared methods.

algorithm, dataset, grey relationship degree, (13 more...)

arXiv.org Machine Learning

1912.11477

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science > Data Mining (0.88)

Add feedback

An Entropy-based Variable Feature Weighted Fuzzy k-Means Algorithm for High Dimensional Data

Singh, Vikas, Verma, Nishchal K.

arXiv.org Machine LearningDec-23-2019

This paper presents a new fuzzy k-means algorithm for the clustering of high dimensional data in various subspaces. Since, In the case of high dimensional data, some features might be irrelevant and relevant but may have different significance in the clustering. For a better clustering, it is crucial to incorporate the contribution of these features in the clustering process. To combine these features, in this paper, we have proposed a new fuzzy k-means clustering algorithm in which the objective function of the fuzzy k-means is modified using two different entropy term. The first entropy term helps to minimize the within-cluster dispersion and maximize the negative entropy to determine clusters to contribute to the association of data points. The second entropy term helps to control the weight of the features because different features have different contributing weights in the clustering process for obtaining the better partition of the data. The efficacy of the proposed method is presented in terms of various clustering measures on multiple datasets and compared with various state-of-the-art methods.

algorithm, entropy, partition, (14 more...)

arXiv.org Machine Learning

1912.11209

Country: North America > United States > California > Alameda County > Oakland (0.04)

Genre: Research Report (0.71)

Industry: Health & Medicine > Therapeutic Area (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Interpretable Embeddings From Molecular Simulations Using Gaussian Mixture Variational Autoencoders

Varolgunes, Yasemin Bozkurt, Bereau, Tristan, Rudzinski, Joseph F.

arXiv.org Machine LearningDec-22-2019

Extracting insight from the enormous quantity of data generated from molecular simulations requires the identification of a small number of collective variables whose corresponding low-dimensional free-energy landscape retains the essential features of the underlying system. Data-driven techniques provide a systematic route to constructing this landscape, without the need for extensive a priori intuition into the relevant driving forces. In particular, autoencoders are powerful tools for dimensionality reduction, as they naturally force an information bottleneck and, thereby, a low-dimensional embedding of the essential features. While variational autoencoders ensure continuity of the embedding by assuming a unimodal Gaussian prior, this is at odds with the multi-basin free-energy landscapes that typically arise from the identification of meaningful collective variables. In this work, we incorporate this physical intuition into the prior by employing a Gaussian mixture variational autoencoder (GMVAE), which encourages the separation of metastable states within the embedding. The GMVAE performs dimensionality reduction and clustering within a single unified framework, and is capable of identifying the inherent dimensionality of the input data, in terms of the number of Gaussians required to categorize the data. We illustrate our approach on two toy models, alanine dipeptide, and a challenging disordered peptide ensemble, demonstrating the enhanced clustering effect of the GMVAE prior compared to standard VAEs. The resulting embeddings appear to be promising representations for constructing Markov state models, highlighting the transferability of the dimensionality reduction from static equilibrium properties to dynamics.

gmv ae, interpretable, molecular simulation, (13 more...)

arXiv.org Machine Learning

1912.12175

Country:

North America > United States (0.14)
Europe > Germany > Rheinland-Pfalz > Mainz (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Interactive Open-Ended Learning for 3D Object Recognition

Kasaei, S. Hamidreza

arXiv.org Artificial IntelligenceDec-19-2019

The thesis contributes in several important ways to the research area of 3D object category learning and recognition. To cope with the mentioned limitations, we look at human cognition, in particular at the fact that human beings learn to recognize object categories ceaselessly over time. This ability to refine knowledge from the set of accumulated experiences facilitates the adaptation to new environments. Inspired by this capability, we seek to create a cognitive object perception and perceptual learning architecture that can learn 3D object categories in an open-ended fashion. In this context, ``open-ended'' implies that the set of categories to be learned is not known in advance, and the training instances are extracted from actual experiences of a robot, and thus become gradually available, rather than being available since the beginning of the learning process. In particular, this architecture provides perception capabilities that will allow robots to incrementally learn object categories from the set of accumulated experiences and reason about how to perform complex tasks. This framework integrates detection, tracking, teaching, learning, and recognition of objects. An extensive set of systematic experiments, in multiple experimental settings, was carried out to thoroughly evaluate the described learning approaches. Experimental results show that the proposed system is able to interact with human users, learn new object categories over time, as well as perform complex tasks. The contributions presented in this thesis have been fully implemented and evaluated on different standard object and scene datasets and empirically evaluated on different robotic platforms.

category learning and recognitionfigure 5, object category learning and recognition, recognition and pose estimation system, (15 more...)

arXiv.org Artificial Intelligence

1912.09539

Country:

Europe > Portugal > Aveiro > Aveiro (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
(13 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material (1.00)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment (0.92)
Education > Educational Setting > Online (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(5 more...)

Add feedback

Balancing the Tradeoff Between Clustering Value and Interpretability

Saisubramanian, Sandhya, Galhotra, Sainyam, Zilberstein, Shlomo

arXiv.org Machine LearningDec-18-2019

Graph clustering groups entities -- the vertices of a graph -- based on their similarity, typically using a complex distance function over a large number of features. Successful integration of clustering approaches in automated decision-support systems hinges on the interpretability of the resulting clusters. This paper addresses the problem of generating interpretable clusters, given features of interest that signify interpretability to an end-user, by optimizing interpretability in addition to common clustering objectives. We propose a $\beta$-interpretable clustering algorithm that ensures that at least $\beta$ fraction of nodes in each cluster share the same feature value. The tunable parameter $\beta$ is user-specified. We also present a more efficient algorithm for scenarios with $\beta\!=\!1$ and analyze the theoretical guarantees of the two algorithms. Finally, we empirically demonstrate the benefits of our approaches in generating interpretable clusters using four real-world datasets. The interpretability of the clusters is complemented by generating simple explanations denoting the feature values of the nodes in the clusters, using frequent pattern mining.

data mining, machine learning, pattern recognition, (20 more...)

arXiv.org Machine Learning

1912.0782

Country:

Africa > Kenya (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)

Add feedback