AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

GLOMA: Embedding Global Information in Local Matrix Approximation Models for Collaborative Filtering

AAAI ConferencesFeb-14-2017

Recommender systems have achieved great success in recent years, and matrix approximation (MA) is one of the most popular techniques for collaborative filtering (CF) based recommendation. However, a major issue is that MA methods perform poorly at detecting strong localized associations among closely related users and items. Recently, some MA-based CF methods adopt clustering methods to discover meaningful user-item subgroups and perform ensemble on different clusterings to improve the recommendation accuracy. However, ensemble learning suffers from lower efficiency due to the increased overall computation overhead. In this paper, we propose GLOMA, a new clustering-based matrix approximation method, which can embed global information in local matrix approximation models to improve recommendation accuracy. In GLOMA, a MA model is first trained on the entire data to capture global information. The global MA model is then utilized to guide the training of cluster-based local MA models, such that the local models can detect strong localized associations shared within clusters and at the same time preserve global associations shared among all users/items. Evaluation results using MovieLens and Netflix datasets demonstrate that, by integrating global information in local models, GLOMA can outperform five state-of-the-art MA-based CF methods in recommendation accuracy while achieving descent efficiency.

accuracy, artificial intelligence, machine learning, (17 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country:

Asia (0.47)
North America > United States > Colorado (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology (0.35)
Media > Film (0.35)
Leisure & Entertainment (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)

Add feedback

PIVE: Per-Iteration Visualization Environment for Real-Time Interactions with Dimension Reduction and Clustering

AAAI ConferencesFeb-14-2017

One of the key advantages of visual analytics is its capability to leverage both humans's visual perception and the power of computing. A big obstacle in integrating machine learning with visual analytics is its high computing cost. To tackle this problem, this paper presents PIVE (Per-Iteration Visualization Environment) that supports real-time interactive visualization with machine learning. By immediately visualizing the intermediate results from algorithm iterations, PIVE enables users to quickly grasp insights and interact with the intermediate output, which then affects subsequent algorithm iterations. In addition, we propose a widely-applicable interaction methodology that allows efficient incorporation of user feedback into virtually any iterative computational method without introducing additional computational cost. We demonstrate the application of PIVE for various dimension reduction algorithms such as multidimensional scaling and t-SNE and clustering and topic modeling algorithms such as k-means and latent Dirichlet allocation.

artificial intelligence, iteration, machine learning, (15 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

IEDC: An Integrated Approach for Overlapping and Non-overlapping Community Detection

Hajiabadi, Mahdi, Zare, Hadi, Bobarshad, Hossein

arXiv.org Machine LearningFeb-13-2017

Community detection is a task of fundamental importance in social network analysis that can be used in a variety of knowledge-based domains. While there exist many works on community detection based on connectivity structures, they suffer from either considering the overlapping or non-overlapping communities. In this work, we propose a novel approach for general community detection through an integrated framework to extract the overlapping and non-overlapping community structures without assuming prior structural connectivity on networks. Our general framework is based on a primary node based criterion which consists of the internal association degree along with the external association degree. The evaluation of the proposed method is investigated through the extensive simulation experiments and several benchmark real network datasets. The experimental results show that the proposed method outperforms the earlier state-of-the-art algorithms based on the well-known evaluation criteria. Introduction Identifying communities is one of the most fundamental tasks in the network science. The detection of community structures has allowed us to study and discover the latent underlying mechanism behind the relationships of the entities of networks. The community detection can be considered as an unsupervised learning problem.

community structure, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1016/j.knosys.2017.02.018

1612.04679

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine (0.46)
Information Technology > Services (0.35)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Statistical Inference for Cluster Trees

Kim, Jisu, Chen, Yen-Chi, Balakrishnan, Sivaraman, Rinaldo, Alessandro, Wasserman, Larry

arXiv.org Machine LearningFeb-12-2017

A cluster tree provides a highly-interpretable summary of a density function by representing the hierarchy of its high-density clusters. It is estimated using the empirical tree, which is the cluster tree constructed from a density estimator. This paper addresses the basic question of quantifying our uncertainty by assessing the statistical significance of topological features of an empirical cluster tree. We first study a variety of metrics that can be used to compare different trees, analyze their properties and assess their suitability for inference. We then propose methods to construct and summarize confidence sets for the unknown true cluster tree. We introduce a partial ordering on cluster trees which we use to prune some of the statistically insignificant features of the empirical tree, yielding interpretable and parsimonious cluster trees. Finally, we illustrate the proposed methods on a variety of synthetic examples and furthermore demonstrate their utility in the analysis of a Graft-versus-Host Disease (GvHD) data set.

artificial intelligence, cluster tree, machine learning, (18 more...)

arXiv.org Machine Learning

1605.06416

Country: North America > United States (0.46)

Genre: Research Report (0.70)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Sparse Convex Clustering

Wang, Binhuan, Zhang, Yilong, Sun, Will Wei, Fang, Yixin

arXiv.org Machine LearningFeb-10-2017

Convex clustering, a convex relaxation of k-means clustering and hierarchical clustering, has drawn recent attentions since it nicely addresses the instability issue of traditional nonconvex clustering methods. Although its computational and statistical properties have been recently studied, the performance of convex clustering has not yet been investigated in the high-dimensional clustering scenario, where the data contains a large number of features and many of them carry no information about the clustering structure. In this paper, we demonstrate that the performance of convex clustering could be distorted when the uninformative features are included in the clustering. To overcome it, we introduce a new clustering method, referred to as Sparse Convex Clustering, to simultaneously cluster observations and conduct feature selection. The key idea is to formulate convex clustering in a form of regularization, with an adaptive group-lasso penalty term on cluster centers. In order to optimally balance the tradeoff between the cluster fitting and sparsity, a tuning criterion based on clustering stability is developed. In theory, we provide an unbiased estimator for the degrees of freedom of the proposed sparse convex clustering method. Finally, the effectiveness of the sparse convex clustering is examined through a variety of numerical experiments and a real data application.

convex, selection, sparse convex, (16 more...)

arXiv.org Machine Learning

doi: 10.1080/10618600.2017.1377081

1601.04586

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Compressive K-means

Keriven, Nicolas, Tremblay, Nicolas, Traonmilin, Yann, Gribonval, Rémi

arXiv.org Machine LearningFeb-10-2017

The Lloyd-Max algorithm is a classical approach to perform K-means clustering. Unfortunately, its cost becomes prohibitive as the training dataset grows large. We propose a compressive version of K-means (CKM), that estimates cluster centers from a sketch, i.e. from a drastically compressed representation of the training dataset. We demonstrate empirically that CKM performs similarly to Lloyd-Max, for a sketch size proportional to the number of cen-troids times the ambient dimension, and independent of the size of the original dataset. Given the sketch, the computational complexity of CKM is also independent of the size of the dataset. Unlike Lloyd-Max which requires several replicates, we further demonstrate that CKM is almost insensitive to initialization. For a large dataset of 10^7 data points, we show that CKM can run two orders of magnitude faster than five replicates of Lloyd-Max, with similar clustering performance on artificial data. Finally, CKM achieves lower classification errors on handwritten digits classification.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1610.08738

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Clustering For Point Pattern Data

Tran, Quang N., Vo, Ba-Ngu, Phung, Dinh, Vo, Ba-Tuong

arXiv.org Machine LearningFeb-7-2017

Clustering is one of the most common unsupervised learning tasks in machine learning and data mining. Clustering algorithms have been used in a plethora of applications across several scientific fields. However, there has been limited research in the clustering of point patterns - sets or multi-sets of unordered elements - that are found in numerous applications and data sources. In this paper, we propose two approaches for clustering point patterns. The first is a non-parametric method based on novel distances for sets. The second is a model-based approach, formulated via random finite set theory, and solved by the Expectation-Maximization algorithm. Numerical experiments show that the proposed methods perform well on both simulated and real data.

algorithm, dataset, point pattern, (15 more...)

arXiv.org Machine Learning

1702.02262

Country:

Oceania > Australia (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Robust Clustering for Time Series Using Spectral Densities and Functional Data Analysis

Rivera-García, Diego, García-Escudero, Luis Angel, Mayo-Iscar, Agustín, Ortega, Joaquín

arXiv.org Machine LearningFeb-7-2017

In this work a robust clustering algorithm for stationary time series is proposed. The algorithm is based on the use of estimated spectral densities, which are considered as functional data, as the basic characteristic of stationary time series for clustering purposes. A robust algorithm for functional data is then applied to the set of spectral densities. Trimming techniques and restrictions on the scatter within groups reduce the effect of noise in the data and help to prevent the identification of spurious clusters. The procedure is tested in a simulation study, and is also applied to a real data set.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1702.02165

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Data Science > Data Mining (0.89)

Add feedback

Shape-Based Approach to Household Load Curve Clustering and Prediction

Teeraratkul, Thanchanok, O'Neill, Daniel, Lall, Sanjay

arXiv.org Machine LearningFeb-5-2017

Consumer Demand Response (DR) is an important research and industry problem, which seeks to categorize, predict and modify consumer's energy consumption. Unfortunately, traditional clustering methods have resulted in many hundreds of clusters, with a given consumer often associated with several clusters, making it difficult to classify consumers into stable representative groups and to predict individual energy consumption patterns. In this paper, we present a shape-based approach that better classifies and predicts consumer energy consumption behavior at the household level. The method is based on Dynamic Time Warping. DTW seeks an optimal alignment between energy consumption patterns reflecting the effect of hidden patterns of regular consumer behavior. Using real consumer 24-hour load curves from Opower Corporation, our method results in a 50% reduction in the number of representative groups and an improvement in prediction accuracy measured under DTW distance. We extend the approach to estimate which electrical devices will be used and in which hours.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1109/TSG.2017.2683461

1702.01414

Country: North America > United States > California (0.28)

Genre: Research Report (0.40)

Industry: Energy (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

Comparison of Clustering Techniques for Residential Energy Behavior Using Smart Meter Data

Jin, Ling (Lawrence Berkeley National Laboratory) | Lee, Doris (Lawrence Berkeley National Laboratory) | Sim, Alex (Lawrence Berkeley National Laboratory) | Borgeson, Sam (Lawrence Berkeley National Laboratory) | Wu, Kesheng (Lawrence Berkeley National Laboratory) | Spurlock, C. Anna (Lawrence Berkeley National Laboratory) | Todd, Annika (Lawrence Berkeley National Laboratory)

AAAI ConferencesFeb-4-2017

Current practice in whole time series clustering of residential meter data focuses on aggregated or subsampled load data at the customer level, which ignores day-to-day differences within customers. This information is critical to determine each customer’s suitability to various demand side management strategies that support intelligent power grids and smart energy management. Clustering daily load shapes provides fine-grained information on customer attributes and sources of variation for subsequent models and customer segmentation. In this paper, we apply 11 clustering methods to daily residential meter data. We evaluate their parameter settings and suitability based on 6 generic performance metrics and post-checking of resulting clusters. Finally, we recommend suitable techniques and parameters based on the goal of discovering diverse daily load patterns among residential customers. To the authors’ knowledge, this paper is the first robust comparative review of clustering techniques applied to daily residential load shape time series in the power systems’ literature.

clustering technique, residential energy behavior, smart meter data

AAAI Conferences

Workshops at the Thirty-First AAAI Conference on Artificial Intelligence

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.60)

Add feedback