AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Clustering Discrete Valued Time Series

Roick, Tyler, Karlis, Dimitris, McNicholas, Paul D.

arXiv.org Machine LearningJan-26-2019

There is a need for the development of models that are able to account for discreteness in data, along with its time series properties and correlation. Our focus falls on INteger-valued AutoRegressive (INAR) type models. The INAR type models can be used in conjunction with existing model-based clustering techniques to cluster discrete valued time series data. With the use of a finite mixture model, several existing techniques such as the selection of the number of clusters, estimation using expectation-maximization and model selection are applicable. The proposed model is then demonstrated on real data to illustrate its clustering applications.

group membership, inar, time sery, (14 more...)

arXiv.org Machine Learning

1901.09249

Country:

North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
North America > United States > California (0.04)
North America > Canada > Ontario > Hamilton (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Optimized Deformed Laplacian for Spectrum-based Community Detection in Sparse Heterogeneous Graphs

Dall'Amico, Lorenzo, Couillet, Romain, Tremblay, Nicolas

arXiv.org Machine LearningJan-25-2019

Spectral clustering is one of the most popular, yet still incompletely understood, methods for community detection on graphs. In this article we study spectral clustering based on the deformed Laplacian matrix $D-rA$, for sparse heterogeneous graphs (following a two-class degree-corrected stochastic block model). For a specific value $r = \zeta$, we show that, unlike competing methods such as the Bethe Hessian or non-backtracking operator approaches, clustering is insensitive to the graph heterogeneity. Based on heuristic arguments, we study the behavior of the informative eigenvector of $D-\zeta A$ and, as a result, we accurately predict the clustering accuracy. Via extensive simulations and application to real networks, the resulting clustering algorithm is validated and observed to systematically outperform state-of-the-art competing methods.

eigenvalue, eigenvector, graph, (13 more...)

arXiv.org Machine Learning

1901.09715

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

Subspace Clustering of Very Sparse High-Dimensional Data

Peng, Hankui, Pavlidis, Nicos, Eckley, Idris, Tsalamanis, Ioannis

arXiv.org Machine LearningJan-25-2019

In this paper we consider the problem of clustering collections of very short texts using subspace clustering. This problem arises in many applications such as product categorisation, fraud detection, and sentiment analysis. The main challenge lies in the fact that the vectorial representation of short texts is both high-dimensional, due to the large number of unique terms in the corpus, and extremely sparse, as each text contains a very small number of words with no repetition. We propose a new, simple subspace clustering algorithm that relies on linear algebra to cluster such datasets. Experimental results on identifying product categories from product names obtained from the US Amazon website indicate that the algorithm can be competitive against state-of-the-art clustering algorithms.

algorithm, matrix, subspace, (17 more...)

arXiv.org Machine Learning

1901.09108

Country:

Asia > Middle East > Jordan (0.05)
Europe > United Kingdom > England > Lancashire > Lancaster (0.05)
Europe > United Kingdom > Wales > Newport (0.04)
Europe > France (0.04)

Genre: Research Report (0.40)

Industry: Law Enforcement & Public Safety (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

A Kalman filtering induced heuristic optimization based partitional data clustering

Pakrashi, Arjun, Chaudhuri, Bidyut B.

arXiv.org Machine LearningJan-25-2019

Clustering algorithms have regained momentum with recent popularity of data mining and knowledge discovery approaches. To obtain good clustering in reasonable amountof time, various meta-heuristic approaches and their hybridization, sometimes with K-Means technique, have been employed. A Kalman Filtering basedheuristic approach called Heuristic Kalman Algorithm (HKA) has been proposed a few years ago, which may be used for optimizing an objective functionin data/feature space. In this paper at first HKA is employed in partitional data clustering. Then an improved approach named HKA-K is proposed, whichcombines the benefits of global exploration of HKA and the fast convergence of K-Means method. Implemented and tested on several datasets from UCI machine learning repository, the results obtained by HKA-K were compared with other hybrid meta-heuristic clustering approaches. It is shown that HKA-K is atleast as good as and often better than the other compared algorithms. Keywords:Clustering, K-Means, Optimization, Metaheuristic Optimization, Heuristics 1. Introduction Clustering is the process of assigning a set of n data points into C classes based on the similarity between the data points in the feature space. It is useful when some prototype data from known classes are not available for training a supervised classifier or for an exploratory data analysis task. It is one of the earliest pattern classification approaches and has found renewed interest since the beginning of data mining and big data analytics.

algorithm, dataset, sciencedirect, (14 more...)

arXiv.org Machine Learning

doi: 10.1016/j.ins.2016.07.057

1901.09082

Country:

Oceania > Australia (0.04)
North America > United States > New York (0.04)
North America > United States > New Jersey > Hudson County > Secaucus (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Guarantees for Spectral Clustering with Fairness Constraints

Kleindessner, Matthäus, Samadi, Samira, Awasthi, Pranjal, Morgenstern, Jamie

arXiv.org Machine LearningJan-24-2019

Given the widespread popularity of spectral clustering (SC) for partitioning graph data, we study a version of constrained SC in which we try to incorporate the fairness notion proposed by Chierichetti et al. (2017). According to this notion, a clustering is fair if every demographic group is approximately proportionally represented in each cluster. To this end, we develop variants of both normalized and unnormalized constrained SC and show that they help find fairer clusterings on both synthetic and real data. We also provide a rigorous theoretical analysis of our algorithms. While there have been efforts to incorporate various constraints into the SC framework, theoretically analyzing them is a challenging problem. We overcome this by proposing a natural variant of the stochastic block model where h groups have strong inter-group connectivity, but also exhibit a "natural" clustering structure which is fair. We prove that our algorithms can recover this fair clustering with high probability.

algorithm, algorithm 2, spectral clustering, (12 more...)

arXiv.org Machine Learning

1901.08668

Country:

North America > United States > Connecticut > Hartford County > Hartford (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report (0.63)
Instructional Material (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

Thirty Years of Machine Learning:The Road to Pareto-Optimal Next-Generation Wireless Networks

Wang, Jingjing, Jiang, Chunxiao, Zhang, Haijun, Ren, Yong, Chen, Kwang-Cheng, Hanzo, Lajos

arXiv.org Machine LearningJan-23-2019

Next-generation wireless networks (NGWN) have a substantial potential in terms of supporting a broad range of complex compelling applications both in military and civilian fields, where the users are able to enjoy high-rate, low-latency, low-cost and reliable information services. Achieving this ambitious goal requires new radio techniques for adaptive learning and intelligent decision making because of the complex heterogeneous nature of the network structures and wireless services. Machine learning algorithms have great success in supporting big data analytics, efficient parameter estimation and interactive decision making. Hence, in this article, we review the thirty-year history of machine learning by elaborating on supervised learning, unsupervised learning, reinforcement learning and deep learning, respectively. Furthermore, we investigate their employment in the compelling applications of NGWNs, including heterogeneous networks (HetNets), cognitive radios (CR), Internet of things (IoT), machine to machine networks (M2M), and so on. This article aims for assisting the readers in clarifying the motivation and methodology of the various machine learning algorithms, so as to invoke them for hitherto unexplored services as well as scenarios of future wireless networks.

algorithm, ieee transaction, wireless network, (14 more...)

arXiv.org Machine Learning

1902.01946

Country:

North America > United States > Florida > Hillsborough County > Tampa (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > South Korea (0.14)
(37 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Transportation (1.00)
Telecommunications > Networks (1.00)
Information Technology > Security & Privacy (1.00)
(5 more...)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(8 more...)

Add feedback

Deep Clustering with a Dynamic Autoencoder

Mrabah, Nairouz, Khan, Naimul Mefraz, Ksantini, Riadh

arXiv.org Machine LearningJan-23-2019

In unsupervised learning, there is no obvious straightforward loss function which can capture the major factors of variations and similarities. Since natural systems have smooth dynamics, an opportunity is lost if an unsupervised loss function remains static during the training process. The absence of concrete supervision suggests that smooth complex dynamics should be integrated as a substitute to the classical static loss functions to better make use of the gradual and uncertain knowledge acquired through self-supervision. In this paper, we propose Dynamic Autoencoder (DynAE), a new model for deep clustering that allows to solve a clustering-reconstruction trade-off by gradually and smoothly eliminating the reconstruction objective in favor of a construction one while preserving the space topology. Experimental evaluations on benchmark datasets show that our approach achieves state-of-the-art results compared to all the other autoencoder-based clustering methods.

dataset, loss function, representation, (14 more...)

arXiv.org Machine Learning

1901.07752

Country:

Africa > Middle East > Tunisia > Tunis Governorate > Tunis (0.04)
North America > United States > California > Alameda County > Oakland (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

CommunityGAN: Community Detection with Generative Adversarial Nets

Jia, Yuting, Zhang, Qinqin, Zhang, Weinan, Wang, Xinbing

arXiv.org Artificial IntelligenceJan-22-2019

Community detection refers to the task of discovering groups of vertices sharing similar properties or functions so as to understand the network data. With the recent development of deep learning, graph representation learning techniques are also utilized for community detection. However, the communities can only be inferred by applying clustering algorithms based on learned vertex embeddings. These general cluster algorithms like K-means and Gaussian Mixture Model cannot output much overlapped communities, which have been proved to be very common in many real-world networks. In this paper, we propose CommunityGAN, a novel community detection framework that jointly solves overlapping community detection and graph representation learning. First, unlike the embedding of conventional graph representation learning algorithms where the vector entry values have no specific meanings, the embedding of CommunityGAN indicates the membership strength of vertices to communities. Second, a specifically designed Generative Adversarial Net (GAN) is adopted to optimize such embedding. Through the minimax competition between the motif-level generator and discriminator, both of them can alternatively and iteratively boost their performance and finally output a better community structure. Extensive experiments on synthetic data and real-world tasks demonstrate that CommunityGAN achieves substantial community detection performance gains over the state-of-the-art methods.

data mining, machine learning, vertex, (18 more...)

arXiv.org Artificial Intelligence

1901.06631

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Unsupervised Automated Event Detection using an Iterative Clustering based Segmentation Approach

Gupta, Deepak K., Shrivastava, Rohit K., Phadke, Suhas, Goudswaard, Jeroen

arXiv.org Artificial IntelligenceJan-22-2019

A class of vision problems, less commonly studied, consists of detecting objects in imagery obtained from physics-based experiments. These objects can span in 4D (x, y, z, t) and are visible as disturbances (caused due to physical phenomena) in the image with background distribution being approximately uniform. Such objects, occasionally referred to as `events', can be considered as high energy blobs in the image. Unlike the images analyzed in conventional vision problems, very limited features are associated with such events, and their shape, size and count can vary significantly. This poses a challenge on the use of pre-trained models obtained from supervised approaches. In this paper, we propose an unsupervised approach involving iterative clustering based segmentation (ICS) which can detect target objects (events) in real-time. In this approach, a test image is analyzed over several cycles, and one event is identified per cycle. Each cycle consists of the following steps: (1) image segmentation using a modified k-means clustering method, (2) elimination of empty (with no events) segments based on statistical analysis of each segment, (3) merging segments that overlap (correspond to same event), and (4) selecting the strongest event. These four steps are repeated until all the events have been identified. The ICS approach consists of a few hyper-parameters that have been chosen based on statistical study performed over a set of test images. The applicability of ICS method is demonstrated on several 2D and 3D test examples.

artificial intelligence, machine learning, partition, (16 more...)

arXiv.org Artificial Intelligence

1901.07222

Country:

North America > United States > New York (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre:

Research Report (0.50)
Workflow (0.48)

Industry: Energy (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Modal clustering asymptotics with applications to bandwidth selection

Casa, Alessandro, Chacón, José E., Menardi, Giovanna

arXiv.org Machine LearningJan-22-2019

Density-based clustering relies on the idea of linking groups to some specific features of the probability distribution underlying the data. The reference to a true, yet unknown, population structure allows to frame the clustering problem in a standard inferential setting, where the concept of ideal population clustering is defined as the partition induced by the true density function. The nonparametric formulation of this approach, known as modal clustering, draws a correspondence between the groups and the domains of attraction of the density modes. Operationally, a nonparametric density estimate is required and a proper selection of the amount of smoothing, governing the shape of the density and hence possibly the modal structure, is crucial to identify the final partition. In this work, we address the issue of density estimation for modal clustering from an asymptotic perspective. A natural and easy to interpret metric to measure the distance between density-based partitions is discussed, its asymptotic approximation explored, and employed to study the problem of bandwidth selection for nonparametric modal clustering.

bandwidth selection, estimation, modal, (16 more...)

arXiv.org Machine Learning

1901.073

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York (0.04)
Europe > Spain > Extremadura (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback