AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Learning Human-Understandable Strategies

Ganzfried, Sam (Florida International University) | Yusuf, Farzana (Florida International University)

AAAI ConferencesFeb-4-2017

Algorithms for equilibrium computation generally make no attempt to ensure that the computed strategies are understandable by humans. For instance the strategies for the strongest poker agents are represented as massive binary files. In many situations, we would like to compute strategies that can actually be implemented by humans, who may have computational limitations and may only be able to remember a small number of features or components of the strategies that have been computed. We study poker games where private information distributions can be arbitrary. We create a large training set of game instances and solutions, by randomly selecting the private information probabilities, and present algorithms that learn from the training instances in order to perform well in games with unseen information distributions. One approach first clusters the training points into a small number of clusters and then creates a small decision tree based on the cluster centers. This approach produces low test error and could be easily implemented by humans since it only requires memorizing a small number of "if-then" rules.

decision tree, player 1, probability, (16 more...)

AAAI Conferences

Workshops at the Thirty-First AAAI Conference on Artificial Intelligence

Country:

North America > Canada > Alberta (0.14)
North America > United States > Texas (0.06)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback

Clustering-Aided Approach for Predicting Patient Outcomes with Application to Elderly Healthcare in Ireland

Elbattah, Mahmoud (National University of Ireland Galway) | Molloy, Owen (National University of Ireland Galway)

AAAI ConferencesFeb-4-2017

Predictive analytics have proved promising capabilities and opportunities to many aspects of healthcare practice. Data-driven insights can provide an important part of the solution for curbing rising costs and improving care quality. The paper implements machine learning techniques in an attempt to support decision making in relation to elderly healthcare in Ireland, with a particular focus on hip fracture care. We adopt a combination of unsupervised and supervised learning for predicting patient outcomes. Initially, elderly patients are grouped based on the similarity of age, length of stay (LOS) and elapsed time to surgery. Using the K-Means algorithm, our clustering experiments suggest the presence of three coherent clusters of patients. Subsequently, the discovered clusters are utilised to train prediction models that address a particular cluster of patients individually. In particular, two machine learning models are trained for every cluster of patients in order to predict the inpatient LOS, and discharge destination. The developed models are claimed to make predictions with relatively high accuracy. Furthermore, the potential usefulness of the clustering-guided approach of prediction is discussed in general.

algorithm, artificial intelligence, machine learning, (18 more...)

AAAI Conferences

Workshops at the Thirty-First AAAI Conference on Artificial Intelligence

Country: Europe > Ireland (0.04)

Genre: Research Report > Experimental Study (0.88)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Providers & Services (0.95)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Energy Disaggregation Methods for Commercial Buildings Using Smart Meter and Operational Data

Bansal, Shubham (École Polytechnique Fédérale de Lausanne) | Schmidt, Mischa ( NEC Laboratories Europe )

AAAI ConferencesFeb-4-2017

One of the key information pieces in improving energy efficiency of buildings is the appliance level breakdown of energy consumption. Energy disaggregation is the process of obtaining this breakdown from a building level aggregate data using computational techniques. Most of the current research focuses on residential buildings, obtaining this information from a single smart meter and often relying on high frequency data. This work is directed at commercial buildings equipped with building management and automation systems providing low frequency operational and contextual data. This paper presents a machine learning method to disaggregate energy consumption of the building using this operational data as input features. Experimental results on two publicly available datasets demonstrate the effectiveness of the approach, which surpasses existing methods. For all but one appliance of House 2 of the publicly available REDD dataset, improvements in normalized error in assigned power range between 20% (Lighting) and 220% (Stove). For another dataset from an educational facility in Singapore, disaggregation accuracy of 92% is reported for the facility's cooling system.

appliance, artificial intelligence, machine learning, (19 more...)

AAAI Conferences

Workshops at the Thirty-First AAAI Conference on Artificial Intelligence

Country:

Asia > Singapore (0.25)
Europe > Netherlands > South Holland > Delft (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
(6 more...)

Industry:

Energy (1.00)
Construction & Engineering (1.00)
Banking & Finance > Real Estate (0.91)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Cluster-based Kriging Approximation Algorithms for Complexity Reduction

van Stein, Bas, Wang, Hao, Kowalczyk, Wojtek, Emmerich, Michael, Bäck, Thomas

arXiv.org Machine LearningFeb-4-2017

Kriging or Gaussian Process Regression is applied in many fields as a nonlinear regression model as well as a surrogate model in the field of evolutionary computation. However, the computational and space complexity of Kriging, that is cubic and quadratic in the number of data points respectively, becomes a major bottleneck with more and more data available nowadays. In this paper, we propose a general methodology for the complexity reduction, called cluster Kriging, where the whole data set is partitioned into smaller clusters and multiple Kriging models are built on top of them. In addition, four Kriging approximation algorithms are proposed as candidate algorithms within the new framework. Each of these algorithms can be applied to much larger data sets while maintaining the advantages and power of Kriging. The proposed algorithms are explained in detail and compared empirically against a broad set of existing state-of-the-art Kriging approximation methods on a well-defined testing framework. According to the empirical study, the proposed algorithms consistently outperform the existing algorithms. Moreover, some practical suggestions are provided for using the proposed algorithms. Kriging, or Gaussian Process Regression [1] is a popular and elegant kernel based regression model capable of modeling very complex functions. Kriging is used in many fields e.g. Many other regression models exist, such as parametric models, which are easy to interpret but may lack expressive power to model complex functions.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1702.01313

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)

Add feedback

Variable selection for clustering with Gaussian mixture models: state of the art

Talibi, Abdelghafour, Achchab, Boujemâa, Lasri, Rafik

arXiv.org Machine LearningJan-31-2017

SAA T Laboratory, University of Abdelmalek Essadi, FPL, Larache Morocco Corresponding author: Abdelghafour Talibi,a.talibi@uhp.ac.ma Abstract The mixture models have become widely used in clustering, given its probabilistic framework in which its based, however, for modern databases that are characterized by their large size, these models behave disappointingly in setting out the model, making essential the selection of relevant variables for this type of clustering. After recalling the basics of clustering based on a model, this article will examine the variable selection methods for model-based clustering, as well as presenting opportunities for improvement of these methods. I INTRODUCTION Clustering aims to classify objects of a population in groups, where the objects in the same group are similar to each other, and the objects in different groups are dissimilar. Unlike the supervised classification where the number of groups is known in advance, at least for a sample, in the case of clustering, it is unknown how many groups and it remains to be estimated. In fact, many fields of research used clustering methods on the data, in order to obtain groups that allow understanding and interpreting the phenomenon studied.

artificial intelligence, machine learning, selection, (16 more...)

arXiv.org Machine Learning

1701.08946

Country: Africa > Middle East > Morocco (0.24)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Add feedback

A Study of FOSS'2013 Survey Data Using Clustering Techniques

A, Mani, Mukherjee, Rebeka

arXiv.org Machine LearningJan-31-2017

FOSS is an acronym for Free and Open Source Software. The FOSS 2013 survey primarily targets FOSS contributors and relevant anonymized dataset is publicly available under CC by SA license. In this study, the dataset is analyzed from a critical perspective using statistical and clustering techniques (especially multiple correspondence analysis) with a strong focus on women contributors towards discovering hidden trends and facts. Important inferences are drawn about development practices and other facets of the free software and OSS worlds.

artificial intelligence, machine learning, pattern recognition, (16 more...)

arXiv.org Machine Learning

1701.08302

Country:

Europe (0.29)
Asia > India > West Bengal > Kolkata (0.15)

Genre: Research Report (0.84)

Technology:

Information Technology > Software (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.61)

Add feedback

General examples -- scikit-learn 0.18.1 documentation

#artificialintelligenceJan-30-2017, 23:45:20 GMT

This documentation is for scikit-learn version 0.18.1 -- Other versions If you use the software, please consider citing scikit-learn. Applications to real world problems with some medium sized datasets or interactive user interface. Examples illustrating the calibration of predicted probabilities of classifiers. Examples related to the sklearn.model_selection

artificial intelligence, machine learning, sklearn, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.80)

Add feedback

Self-Adaptation of Activity Recognition Systems to New Sensors

Bannach, David, Jänicke, Martin, Rey, Vitor F., Tomforde, Sven, Sick, Bernhard, Lukowicz, Paul

arXiv.org Machine LearningJan-30-2017

Embedded Intelligence, German Research Center for Artificial Intelligence, Kaiserslautern, Germany, {vitor.fortes,paul.lukowicz}@dfki.de Abstract Traditional activity recognition systems work on the basis of training, taking a fixed set of sensors into account. In this article, we focus on the question how pattern recognition can leverage new information sources without any, or with minimal user input. Thus, we present an approach for opportunistic activity recognition, where ubiquitous sensors lead to dynamically changing input spaces. Our method is a variation of well-established principles of machine learning, relying on unsupervised clustering to discover structure in data and inferring cluster labels from a small number of labeled dates in a semi-supervised manner. Elaborating the challenges, evaluations of over 3000 sensor combinations from three multiuser experiments are presented in detail and show the potential benefit of our approach. Keywords: Opportunistic Activity Recognition, Unsupervised Learning, Semi-supervised Learning, Classifier Adaptation 1. Introduction Today, state-of-the-art approaches to activity and context recognition typically assume fixed, narrowly defined system configurations dedicated to often also narrowly defined tasks. Such systems can only work when sensors are known in the training phase and they cannot adapt to new sensors in their environment. In turn, sensors are evermore present in our life, although not always available. When moving around, a person may face highly instrumented environments and places with little or no intelligent infrastructure. Concerning on-body sensing, a user may carry a varying collection of sensor enabled devices (mobile phone, watch, headset, etc.) on different, dynamically varying body locations (different pockets, wrist, bag). Thus, in order to realize their full potential, systems need to take advantage of devices that just "happen" to be in the environment, taking into account their current placement and relevance. In our previous work, we investigated how on-body position and orientation of on-body sensors can be inferred [1, 2], how position shifts can be tolerated [3], and how one sensor can replace another [4]. Preprint submitted to Computational Intelligence and Neuroscience March 15, 2018 integration. More precisely, this means to answer the question how can a new sensor's data be integrated in an existing activity recognition system at runtime in order to improve this recognition process. Extending a system that used n sensors to one that uses (n 1) has many challenges. For instance, training data is expensive, and thus we cannot expect the new (n 1) data to be labeled.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Machine Learning

1701.08528

Country:

North America > United States (0.93)
Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.24)

Genre: Research Report > Promising Solution (0.48)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

The Impact of Estimation: A New Method for Clustering and Trajectory Estimation in Patient Flow Modeling

Ranjan, Chitta, Paynabar, Kamran, Helm, Jonathan E., Pan, Julian

arXiv.org Machine LearningJan-29-2017

The ability to accurately forecast and control inpatient census, and thereby workloads, is a critical and longstanding problem in hospital management. Majority of current literature focuses on optimal scheduling of inpatients, but largely ignores the process of accurate estimation of the trajectory of patients throughout the treatment and recovery process. The result is that current scheduling models are optimizing based on inaccurate input data. We developed a Clustering and Scheduling Integrated (CSI) approach to capture patient flows through a network of hospital services. CSI functions by clustering patients into groups based on similarity of trajectory using a novel Semi-Markov model (SMM)-based clustering scheme proposed in this paper, as opposed to clustering by admit type or condition as in previous literature. The methodology is validated by simulation and then applied to real patient data from a partner hospital where we see it outperforms current methods. Further, we demonstrate that extant optimization methods achieve significantly better results on key hospital performance measures under CSI, compared with traditional estimation approaches, increasing elective admissions by 97% and utilization by 22% compared to 30% and 8% using traditional estimation techniques. From a theoretical standpoint, the SMM-clustering is a novel approach applicable to any temporal-spatial stochastic data that is prevalent in many industries and application areas.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1505.07752

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(3 more...)

Add feedback

Riemannian-geometry-based modeling and clustering of network-wide non-stationary time series: The brain-network case

Slavakis, Konstantinos, Salsabilian, Shiva, Wack, David S., Muldoon, Sarah F., Baidoo-Williams, Henry E., Vettel, Jean M., Cieslak, Matthew, Grafton, Scott T.

arXiv.org Machine LearningJan-26-2017

This paper advocates Riemannian multi-manifold modeling in the context of network-wide non-stationary time-series analysis. Time-series data, collected sequentially over time and across a network, yield features which are viewed as points in or close to a union of multiple submanifolds of a Riemannian manifold, and distinguishing disparate time series amounts to clustering multiple Riemannian submanifolds. To support the claim that exploiting the latent Riemannian geometry behind many statistical features of time series is beneficial to learning from network data, this paper focuses on brain networks and puts forth two feature-generation schemes for network-wide dynamic time series. The first is motivated by Granger-causality arguments and uses an auto-regressive moving average model to map low-rank linear vector subspaces, spanned by column vectors of appropriately defined observability matrices, to points into the Grassmann manifold. The second utilizes (non-linear) dependencies among network nodes by introducing kernel-based partial correlations to generate points in the manifold of positive-definite matrices. Capitilizing on recently developed research on clustering Riemannian submanifolds, an algorithm is provided for distinguishing time series based on their geometrical properties, revealed within Riemannian feature spaces. Extensive numerical tests demonstrate that the proposed framework outperforms classical and state-of-the-art techniques in clustering brain-network states/structures hidden beneath synthetic fMRI time series and brain-activity signals generated from real brain-network structural connectivity matrices.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1701.07767

Country:

North America > United States (1.00)
Europe (0.67)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback