AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Kernel-estimated Nonparametric Overlap-Based Syncytial Clustering

Almodóvar-Rivera, Israel, Maitra, Ranjan

arXiv.org Machine LearningDec-11-2019

Commonly-used clustering algorithms usually find ellipsoidal, spherical or other regular-structured clusters, but are more challenged when the underlying groups lack formal structure or definition. Syncytial clustering is the name that we introduce for methods that merge groups obtained from standard clustering algorithms in order to reveal complex group structure in the data. Here, we develop a distribution-free fully-automated syncytial clustering algorithm that can be used with $k$-means and other algorithms. Our approach computes the cumulative distribution function of the normed residuals from an appropriately fit $k$-groups model and calculates the nonparametric overlap between each pair of clusters. Groups with high pairwise overlap are merged as long as the generalized overlap decreases. Our methodology is always a top performer in identifying groups with regular and irregular structures in several datasets and can be applied to datasets with scatter or incomplete records. The approach is also used to identify the distinct kinds of gamma ray bursts in the Burst and Transient Source Experiment 4Br catalog and the distinct kinds of activation in a functional Magnetic Resonance Imaging study.

algorithm, dataset, knob-sync, (16 more...)

arXiv.org Machine Learning

1805.09505

Country:

Europe > Austria > Vienna (0.14)
Europe > Italy > Sardinia (0.04)
Europe > Italy > Apulia (0.04)
(14 more...)

Genre: Research Report > Experimental Study (0.45)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Unsupervised Feature Selection based on Adaptive Similarity Learning and Subspace Clustering

Parsa, Mohsen Ghassemi, Zare, Hadi, Ghatee, Mehdi

arXiv.org Machine LearningDec-10-2019

Unsupervised Feature Selection based on Adaptive Similarity Learning and Subspace Clustering Mohsen Ghassemi Parsa a, Hadi Zare a,, Mehdi Ghatee b a Faculty of New Sciences and Technologies, University of Tehran, Iran b Department of Computer Science, Amirkabir University of Technology, IranAbstract Feature selection methods have an important role on the readability of data and the reduction of complexity of learning algorithms. In recent years, a variety of efforts are investigated on feature selection problems based on unsupervised viewpoint due to the laborious labeling task on large datasets. In this paper, we propose a novel approach on unsupervised feature selection initiated from the subspace clustering to preserve the similarities by representation learning of low dimensional subspaces among the samples. A self-expressive model is employed to implicitly learn the cluster similarities in an adaptive manner. The proposed method not only maintains the sample similarities through subspace clustering, but it also captures the discriminative information based on a regularized regression model. In line with the convergence analysis of the proposed method, the experimental results on benchmark datasets demonstrate the effectiveness of our approach as compared with the state of the art methods.

feature selection, objective function, unsupervised feature selection, (11 more...)

arXiv.org Machine Learning

1912.05458

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.24)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.54)

Industry: Health & Medicine > Therapeutic Area (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Add feedback

Transformed Subspace Clustering

Maggu, Jyoti, Majumdar, Angshul, Chouzenoux, Emilie

arXiv.org Machine LearningDec-10-2019

Subspace clustering assumes that the data is separable into separate subspaces. Such a simple assumption, does not always hold. We assume that, even if the raw data is not separable into subspaces, one can learn a representation ( transform coef-fi cients) such that the learnt representation is separable into subspaces. To achieve the intended goal, we embed subspace clustering techniques (locally linear manifold clustering, sparse subspace clustering and low rank representation) into transform learn ing. The entire formulation is jointly learnt; giving rise to a new class of methods called transformed subspace clustering (TSC). In order to account for non - linearity, ker-nelized extensions of TSC are also proposed. To test the performanc e of the propose d techniques, benchmarking is performed on image clustering and document clustering datasets. Comparison with state - of - the - art clustering techniques shows that our formulation improves upon them.

clustering, formulation, subspace, (15 more...)

arXiv.org Machine Learning

1912.04734

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

A time resolved clustering method revealing longterm structures and their short-term internal dynamics

Liechti, Jonas I., Bonhoeffer, Sebastian

arXiv.org Machine LearningDec-9-2019

The last decades have not only been characterized by an explosive growth of data, but also an increasing appreciation of data as a valuable resource. It's value comes with the ability to extract meaningful patterns that are of economic, societal or scientific relevance. A particular challenge is to identify patterns across time, including patterns that might only become apparent when the temporal dimension is taken into account. Here, we present a novel method that aims to achieve this by detecting dynamic clusters, i.e. structural elements that can be present over prolonged durations. It is based on an adaptive identification of majority overlaps between groups at different time points and allows the accommodation of transient decompositions in otherwise persistent dynamic clusters. As such, our method enables the detection of persistent structural elements with internal dynamics and can be applied to any classifiable data, ranging from social contact networks to arbitrary sets of time stamped feature vectors. It provides a unique tool to study systems with non-trivial temporal dynamics with a broad applicability to scientific, societal and economic data.

artificial intelligence, machine learning, snapshot, (17 more...)

arXiv.org Machine Learning

1912.04261

Country: Europe > Switzerland (0.04)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.85)

Add feedback

Improved Analysis of Spectral Algorithm for Clustering

Mizutani, Tomohiko

arXiv.org Machine LearningDec-6-2019

Spectral algorithms are graph partitioning algorithms that partition a node set of a graph into groups by using a spectral embedding map. Clustering techniques based on the algorithms are referred to as spectral clustering and are widely used in data analysis. To gain a better understanding of why spectral clustering is successful, Peng et al. (2015) and Kolev and Mehlhorn (2016) studied the behavior of a certain type of spectral algorithm for a class of graphs, called well-clustered graphs. Specifically, they put an assumption on graphs and showed the performance guarantee of the spectral algorithm under it. The algorithm they studied used the spectral embedding map developed by Shi and Malic (2000). In this paper, we improve on their results, giving a better performance guarantee under a weaker assumption. We also evaluate the performance of the spectral algorithm with the spectral embedding map developed by Ng et al. (2002).

algorithm, assumption, partition, (13 more...)

arXiv.org Machine Learning

1912.02997

Country:

Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Chūbu > Shizuoka Prefecture > Shizuoka (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Making Smart Homes Smarter: Optimizing Energy Consumption with Human in the Loop

Verma, Mudit, Bhambri, Siddhant, Buduru, Arun Balaji

arXiv.org Artificial IntelligenceDec-6-2019

Rapid advancements in the Internet of Things (IoT) have facilitated more efficient deployment of smart environment solutions for specific user requirement. With the increase in the number of IoT devices, it has become difficult for the user to control or operate every individual smart device into achieving some desired goal like optimized power consumption, scheduled appliance running time, etc. Furthermore, existing solutions to automatically adapt the IoT devices are not capable enough to incorporate the user behavior. This paper presents a novel approach to accurately configure IoT devices while achieving the twin objectives of energy optimization along with conforming to user preferences. Our work comprises of unsupervised clustering of devices' data to find the states of operation for each device, followed by probabilistically analyzing user behavior to determine their preferred states. Eventually, we deploy an online reinforcement learning (RL) agent to find the best device settings automatically. Results for three different smart homes' data-sets show the effectiveness of our methodology. To the best of our knowledge, this is the first time that a practical approach has been adopted to achieve the above mentioned objectives without any human interaction within the system.

consumption, domain state, power consumption, (15 more...)

arXiv.org Artificial Intelligence

1912.03298

Country:

Asia > India > NCT > New Delhi (0.04)
Asia > India > NCT > Delhi (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology > Smart Houses & Appliances (1.00)
Energy (1.00)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

A Clustering Approach to Edge Controller Placement in Software Defined Networks with Cost Balancing

Soleymanifar, Reza, Srivastava, Amber, Beck, Carolyn, Salapaka, Srinivasa

arXiv.org Machine LearningDec-5-2019

A Clustering Approach to Edge Controller Placement in Software Defined Networks with Cost Balancing Reza Soleymanifar, Amber Srivastava, Carolyn Beck, Srinivasa Salapaka Abstract -- In this work we introduce two novel deterministic annealing based clustering algorithms to address the problem of Edge Controller Placement (ECP) in wireless edge networks. These networks lie at the core of the fifth generation (5G) wireless systems and beyond. These algorithms, ECP-LL and ECP-LB, address the dominant leader-less and leader-based controller placement topologies and have linear computational complexity in terms of network size, maximum number of clusters and dimensionality of data. Each algorithm tries to place controllers close to edge node clusters and not far away from other controllers to maintain a reasonable balance between synchronization and delay costs. While the ECP problem can be conveniently expressed as a multi-objective mixed integer nonlinear program (MINLP), our algorithms outperform state of art MINLP solver, BARON both in terms of accuracy and speed. Our proposed algorithms have the competitive edge of avoiding poor local minima through a Shannon entropy term in the clustering objective function. Most ECP algorithms are highly susceptible to poor local minima and greatly depend on initialization. Keywords: Clustering, deterministic annealing, 5G networks, software defined networks, wireless edge networks, edge controller placement I.

algorithm, controller, controller placement, (11 more...)

arXiv.org Machine Learning

1912.02915

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.40)

Industry: Telecommunications (0.66)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

Clustering Time-Series by a Novel Slope-Based Similarity Measure Considering Particle Swarm Optimization

Kamalzadeh, Hossein, Ahmadi, Abbas, Mansour, Saeed

arXiv.org Machine LearningDec-5-2019

Recently there has been an increase in the studies on time - series data mining specifically time - series clustering due to the vast existe nce of time - series in various domains. The large volume of data in the form of time - series make s it necessary to employ various techniques such as clustering to understand the data and to extract information and hidden patterns. In the field of clustering specifically, time - series clustering, the most important aspects are the similarity measure used and the algorithm employed to conduct the clustering. In this paper, a new similarity measure for time - series clustering is developed based on a combination of a simple representation of time - series, slope of each segment of time - series, Euclidean distance and the so - called dynamic time warping. It is proved in this paper that the proposed distance measure is metric and thus indexing can be applied. For the task of clustering, the Particle Swarm Optimization algorithm is employed. The proposed similarity measure is compared to three existing measures in terms of various criteria used for the evaluation of clustering algorithms. The results indicate that the propo sed similarity measure outperforms the rest in almost every dataset used in this paper.

algorithm, distance measure, sin 1, (14 more...)

arXiv.org Machine Learning

1912.02405

Country:

North America > United States > Texas > Dallas County > Dallas (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

A sparse negative binomial mixture model for clustering RNA-seq count data

Rahman, Tanbin, Li, Yujia, Ma, Tianzhou, Tang, Lu, Tseng, George

arXiv.org Machine LearningDec-5-2019

Clustering with variable selection is a challenging but critical task for modern small-n-large-p data. Existing methods based on Gaussian mixture models or sparse K-means provide solutions to continuous data. With the prevalence of RNA-seq technology and lack of count data modeling for clustering, the current practice is to normalize count expression data into continuous measures and apply existing models with Gaussian assumption. In this paper, we develop a negative binomial mixture model with gene regularization to cluster samples (small $n$) with high-dimensional gene features (large $p$). EM algorithm and Bayesian information criterion are used for inference and determining tuning parameters. The method is compared with sparse Gaussian mixture model and sparse K-means using extensive simulations and two real transcriptomic applications in breast cancer and rat brain studies. The result shows superior performance of the proposed count data model in clustering accuracy, feature selection and biological interpretation by pathway enrichment analysis.

algorithm, count data, selection, (16 more...)

arXiv.org Machine Learning

1912.02399

Country:

North America > United States > Maryland (0.04)
North America > United States > California > Alameda County > Oakland (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

benedekrozemberczki/ClusterGCN

#artificialintelligenceDec-4-2019, 05:13:16 GMT

Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms.

algorithm, benedekrozemberczki clustergcn, cluster-gcn, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.79)

Add feedback