Goto

Collaborating Authors

 Clustering


Kernel-estimated Nonparametric Overlap-Based Syncytial Clustering

arXiv.org Machine Learning

Commonly-used clustering algorithms usually find ellipsoidal, spherical or other regular-structured clusters, but are more challenged when the underlying groups lack formal structure or definition. Syncytial clustering is the name that we introduce for methods that merge groups obtained from standard clustering algorithms in order to reveal complex group structure in the data. Here, we develop a distribution-free fully-automated syncytial clustering algorithm that can be used with $k$-means and other algorithms. Our approach computes the cumulative distribution function of the normed residuals from an appropriately fit $k$-groups model and calculates the nonparametric overlap between each pair of clusters. Groups with high pairwise overlap are merged as long as the generalized overlap decreases. Our methodology is always a top performer in identifying groups with regular and irregular structures in several datasets and can be applied to datasets with scatter or incomplete records. The approach is also used to identify the distinct kinds of gamma ray bursts in the Burst and Transient Source Experiment 4Br catalog and the distinct kinds of activation in a functional Magnetic Resonance Imaging study.


Unsupervised Feature Selection based on Adaptive Similarity Learning and Subspace Clustering

arXiv.org Machine Learning

Unsupervised Feature Selection based on Adaptive Similarity Learning and Subspace Clustering Mohsen Ghassemi Parsa a, Hadi Zare a,, Mehdi Ghatee b a Faculty of New Sciences and Technologies, University of Tehran, Iran b Department of Computer Science, Amirkabir University of Technology, IranAbstract Feature selection methods have an important role on the readability of data and the reduction of complexity of learning algorithms. In recent years, a variety of efforts are investigated on feature selection problems based on unsupervised viewpoint due to the laborious labeling task on large datasets. In this paper, we propose a novel approach on unsupervised feature selection initiated from the subspace clustering to preserve the similarities by representation learning of low dimensional subspaces among the samples. A self-expressive model is employed to implicitly learn the cluster similarities in an adaptive manner. The proposed method not only maintains the sample similarities through subspace clustering, but it also captures the discriminative information based on a regularized regression model. In line with the convergence analysis of the proposed method, the experimental results on benchmark datasets demonstrate the effectiveness of our approach as compared with the state of the art methods.


Transformed Subspace Clustering

arXiv.org Machine Learning

Subspace clustering assumes that the data is separable into separate subspaces. Such a simple assumption, does not always hold. We assume that, even if the raw data is not separable into subspaces, one can learn a representation ( transform coef-fi cients) such that the learnt representation is separable into subspaces. To achieve the intended goal, we embed subspace clustering techniques (locally linear manifold clustering, sparse subspace clustering and low rank representation) into transform learn ing. The entire formulation is jointly learnt; giving rise to a new class of methods called transformed subspace clustering (TSC). In order to account for non - linearity, ker-nelized extensions of TSC are also proposed. To test the performanc e of the propose d techniques, benchmarking is performed on image clustering and document clustering datasets. Comparison with state - of - the - art clustering techniques shows that our formulation improves upon them.


A time resolved clustering method revealing longterm structures and their short-term internal dynamics

arXiv.org Machine Learning

The last decades have not only been characterized by an explosive growth of data, but also an increasing appreciation of data as a valuable resource. It's value comes with the ability to extract meaningful patterns that are of economic, societal or scientific relevance. A particular challenge is to identify patterns across time, including patterns that might only become apparent when the temporal dimension is taken into account. Here, we present a novel method that aims to achieve this by detecting dynamic clusters, i.e. structural elements that can be present over prolonged durations. It is based on an adaptive identification of majority overlaps between groups at different time points and allows the accommodation of transient decompositions in otherwise persistent dynamic clusters. As such, our method enables the detection of persistent structural elements with internal dynamics and can be applied to any classifiable data, ranging from social contact networks to arbitrary sets of time stamped feature vectors. It provides a unique tool to study systems with non-trivial temporal dynamics with a broad applicability to scientific, societal and economic data.


Improved Analysis of Spectral Algorithm for Clustering

arXiv.org Machine Learning

Spectral algorithms are graph partitioning algorithms that partition a node set of a graph into groups by using a spectral embedding map. Clustering techniques based on the algorithms are referred to as spectral clustering and are widely used in data analysis. To gain a better understanding of why spectral clustering is successful, Peng et al. (2015) and Kolev and Mehlhorn (2016) studied the behavior of a certain type of spectral algorithm for a class of graphs, called well-clustered graphs. Specifically, they put an assumption on graphs and showed the performance guarantee of the spectral algorithm under it. The algorithm they studied used the spectral embedding map developed by Shi and Malic (2000). In this paper, we improve on their results, giving a better performance guarantee under a weaker assumption. We also evaluate the performance of the spectral algorithm with the spectral embedding map developed by Ng et al. (2002).


Making Smart Homes Smarter: Optimizing Energy Consumption with Human in the Loop

arXiv.org Artificial Intelligence

Rapid advancements in the Internet of Things (IoT) have facilitated more efficient deployment of smart environment solutions for specific user requirement. With the increase in the number of IoT devices, it has become difficult for the user to control or operate every individual smart device into achieving some desired goal like optimized power consumption, scheduled appliance running time, etc. Furthermore, existing solutions to automatically adapt the IoT devices are not capable enough to incorporate the user behavior. This paper presents a novel approach to accurately configure IoT devices while achieving the twin objectives of energy optimization along with conforming to user preferences. Our work comprises of unsupervised clustering of devices' data to find the states of operation for each device, followed by probabilistically analyzing user behavior to determine their preferred states. Eventually, we deploy an online reinforcement learning (RL) agent to find the best device settings automatically. Results for three different smart homes' data-sets show the effectiveness of our methodology. To the best of our knowledge, this is the first time that a practical approach has been adopted to achieve the above mentioned objectives without any human interaction within the system.


A Clustering Approach to Edge Controller Placement in Software Defined Networks with Cost Balancing

arXiv.org Machine Learning

A Clustering Approach to Edge Controller Placement in Software Defined Networks with Cost Balancing Reza Soleymanifar, Amber Srivastava, Carolyn Beck, Srinivasa Salapaka Abstract -- In this work we introduce two novel deterministic annealing based clustering algorithms to address the problem of Edge Controller Placement (ECP) in wireless edge networks. These networks lie at the core of the fifth generation (5G) wireless systems and beyond. These algorithms, ECP-LL and ECP-LB, address the dominant leader-less and leader-based controller placement topologies and have linear computational complexity in terms of network size, maximum number of clusters and dimensionality of data. Each algorithm tries to place controllers close to edge node clusters and not far away from other controllers to maintain a reasonable balance between synchronization and delay costs. While the ECP problem can be conveniently expressed as a multi-objective mixed integer nonlinear program (MINLP), our algorithms outperform state of art MINLP solver, BARON both in terms of accuracy and speed. Our proposed algorithms have the competitive edge of avoiding poor local minima through a Shannon entropy term in the clustering objective function. Most ECP algorithms are highly susceptible to poor local minima and greatly depend on initialization. Keywords: Clustering, deterministic annealing, 5G networks, software defined networks, wireless edge networks, edge controller placement I.


Clustering Time-Series by a Novel Slope-Based Similarity Measure Considering Particle Swarm Optimization

arXiv.org Machine Learning

Recently there has been an increase in the studies on time - series data mining specifically time - series clustering due to the vast existe nce of time - series in various domains. The large volume of data in the form of time - series make s it necessary to employ various techniques such as clustering to understand the data and to extract information and hidden patterns. In the field of clustering specifically, time - series clustering, the most important aspects are the similarity measure used and the algorithm employed to conduct the clustering. In this paper, a new similarity measure for time - series clustering is developed based on a combination of a simple representation of time - series, slope of each segment of time - series, Euclidean distance and the so - called dynamic time warping. It is proved in this paper that the proposed distance measure is metric and thus indexing can be applied. For the task of clustering, the Particle Swarm Optimization algorithm is employed. The proposed similarity measure is compared to three existing measures in terms of various criteria used for the evaluation of clustering algorithms. The results indicate that the propo sed similarity measure outperforms the rest in almost every dataset used in this paper.


A sparse negative binomial mixture model for clustering RNA-seq count data

arXiv.org Machine Learning

Clustering with variable selection is a challenging but critical task for modern small-n-large-p data. Existing methods based on Gaussian mixture models or sparse K-means provide solutions to continuous data. With the prevalence of RNA-seq technology and lack of count data modeling for clustering, the current practice is to normalize count expression data into continuous measures and apply existing models with Gaussian assumption. In this paper, we develop a negative binomial mixture model with gene regularization to cluster samples (small $n$) with high-dimensional gene features (large $p$). EM algorithm and Bayesian information criterion are used for inference and determining tuning parameters. The method is compared with sparse Gaussian mixture model and sparse K-means using extensive simulations and two real transcriptomic applications in breast cancer and rat brain studies. The result shows superior performance of the proposed count data model in clustering accuracy, feature selection and biological interpretation by pathway enrichment analysis.


benedekrozemberczki/ClusterGCN

#artificialintelligence

Graph convolutional network (GCN) has been successfully applied to many graph-based applications; however, training a large-scale GCN remains challenging. Current SGD-based algorithms suffer from either a high computational cost that exponentially grows with number of GCN layers, or a large space requirement for keeping the entire graph and the embedding of each node in memory. In this paper, we propose Cluster-GCN, a novel GCN algorithm that is suitable for SGD-based training by exploiting the graph clustering structure. Cluster-GCN works as the following: at each step, it samples a block of nodes that associate with a dense subgraph identified by a graph clustering algorithm, and restricts the neighborhood search within this subgraph. This simple but effective strategy leads to significantly improved memory and computational efficiency while being able to achieve comparable test accuracy with previous algorithms.