AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Decision-Theoretic Clustering of Strategies

Bard, Nolan (University of Alberta) | Nicholas, Deon (University of Waterloo) | Szepesvári, Csaba (University of Alberta) | Bowling, Michael (University of Alberta)

AAAI ConferencesMar-1-2015

Clustering agents by their behaviour can be crucial for building effective agent models. Traditional clustering typically aims to group entities together based on a distance metric, where a desirable clustering is one where the entities in a cluster are spatially close together. Instead, one may desire to cluster based on actionability, or the capacity for the clusters to suggest how an agent should respond to maximize their utility with respect to the entities. Segmentation problems examine this decision-theoretic clustering task. Although finding optimal solutions to these problems is computationally hard, greedy-based approximation algorithms exist. However, in settings where the agent has a combinatorially large number of candidate responses whose utilities must be considered, these algorithms are often intractable. In this work, we show that in many cases the utility function can be factored to allow for an efficient greedy algorithm even when there are exponentially large response spaces. We evaluate our technique theoretically, proving approximation bounds, and empirically using extensive-form games by clustering opponent strategies in toy poker games. Our results demonstrate that these techniques yield dramatically improved clusterings compared to a traditional distance-based clustering approach in terms of both subjective quality and utility obtained by responding to the clusters.

algorithm, artificial intelligence, machine learning, (17 more...)

AAAI Conferences

Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
North America > Canada > Quebec (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)

Genre: Research Report > New Finding (0.86)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

User Clustering in Online Advertising via Topic Models

Geyik, Sahin Cem, Dasdan, Ali, Lee, Kuang-Chih

arXiv.org Artificial IntelligenceFeb-23-2015

In the domain of online advertising, our aim is to serve the best ad to a user who visits a certain webpage, to maximize the chance of a desired action to be performed by this user after seeing the ad. While it is possible to generate a different prediction model for each user to tell if he/she will act on a given ad, the prediction result typically will be quite unreliable with huge variance, since the desired actions are extremely sparse, and the set of users is huge (hundreds of millions) and extremely volatile, i.e., a lot of new users are introduced everyday, or are no longer valid. In this paper we aim to improve the accuracy in finding users who will perform the desired action, by assigning each user to a cluster, where the number of clusters is much smaller than the number of users (in the order of hundreds). Each user will fall into the same cluster with another user if their event history are similar. For this purpose, we modify the probabilistic latent semantic analysis (pLSA) model by assuming the independence of the user and the cluster id, given the history of events. This assumption helps us to identify a cluster of a new user without re-clustering all the users. We present the details of the algorithm we employed as well as the distributed implementation on Hadoop, and some initial results on the clusters that were generated by the algorithm.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1501.06595

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Marketing (1.00)
Information Technology > Services (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback

Clustering multi-way data: a novel algebraic approach

Kernfeld, Eric, Aeron, Shuchin, Kilmer, Misha

arXiv.org Machine LearningFeb-21-2015

In this paper, we develop a method for unsupervised clustering of two-way (matrix) data by combining two recent innovations from different fields: the Sparse Subspace Clustering (SSC) algorithm [10], which groups points coming from a union of subspaces into their respective subspaces, and the t-product [18], which was introduced to provide a matrix-like multiplication for third order tensors. Our algorithm is analogous to SSC in that an "affinity" between different data points is built using a sparse self-representation of the data. Unlike SSC, we employ the t-product in the self-representation. This allows us more flexibility in modeling; infact, SSC is a special case of our method. When using the t-product, three-way arrays are treated as matrices whose elements (scalars) are n-tuples or tubes. Convolutions take the place of scalar multiplication. This framework allows us to embed the 2-D data into a vector-space-like structure called a free module over a commutative ring. These free modules retain many properties of complex inner-product spaces, and we leverage that to provide theoretical guarantees on our algorithm. We show that compared to vector-space counterparts, SSmC achieves higher accuracy and better able to cluster data with less preprocessing in some image clustering problems. In particular we show the performance of the proposed method on Weizmann face database, the Extended Yale B Face database and the MNIST handwritten digits database.

artificial intelligence, machine learning, submodule, (18 more...)

arXiv.org Machine Learning

1412.7056

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Pairwise Constraint Propagation: A Survey

Fu, Zhenyong, Lu, Zhiwu

arXiv.org Machine LearningFeb-19-2015

As one of the most important types of (weaker) supervised information in machine learning and pattern recognition, pairwise constraint, which specifies whether a pair of data points occur together, has recently received significant attention, especially the problem of pairwise constraint propagation. At least two reasons account for this trend: the first is that compared to the data label, pairwise constraints are more general and easily to collect, and the second is that since the available pairwise constraints are usually limited, the constraint propagation problem is thus important. This paper provides an up-to-date critical survey of pairwise constraint propagation research. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the studies of pairwise constraint propagation. To provide a comprehensive survey, we not only categorize existing propagation techniques but also present detailed descriptions of representative methods within each category.

artificial intelligence, constraint propagation, machine learning, (14 more...)

arXiv.org Machine Learning

1502.05752

Country: Asia > China (0.17)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Clustering by Descending to the Nearest Neighbor in the Delaunay Graph Space

Qiu, Teng, Li, Yongjie

arXiv.org Machine LearningFeb-16-2015

In our previous works, we proposed a physically-inspired rule to organize the data points into an in-tree (IT) structure, in which some undesired edges are allowed to occur. By removing those undesired or redundant edges, this IT structure is divided into several separate parts, each representing one cluster. In this work, we seek to prevent the undesired edges from arising at the source. Before using the physically-inspired rule, data points are at first organized into a proximity graph which restricts each point to select the optimal directed neighbor just among its neighbors. Consequently, separated in-trees or clusters automatically arise, without redundant edges requiring to be removed.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Machine Learning

1502.04502

Country: Asia > China (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.85)

Add feedback

Active Function Cross-Entropy Clustering

Spurek, P., Tabor, J., Markowicz, P.

arXiv.org Machine LearningFeb-6-2015

Gaussian Mixture Models (GMM) have found many applications in density estimation and data clustering. However, the model does not adapt well to curved and strongly nonlinear data. Recently there appeared an improvement called AcaGMM (Active curve axis Gaussian Mixture Model), which fits Gaussians along curves using an EM-like (Expectation Maximization) approach. Using the ideas standing behind AcaGMM, we build an alternative active function model of clustering, which has some advantages over AcaGMM. In particular it is naturally defined in arbitrary dimensions and enables an easy adaptation to clustering of complicated datasets along the predefined family of functions. Moreover, it does not need external methods to determine the number of clusters as it automatically reduces the number of groups on-line.

acagmm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1502.01943

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Multiscale Event Detection in Social Media

Dong, Xiaowen, Mavroeidis, Dimitrios, Calabrese, Francesco, Frossard, Pascal

arXiv.org Machine LearningFeb-5-2015

Event detection has been one of the most important research topics in social media analysis. Most of the traditional approaches detect events based on fixed temporal and spatial resolutions, while in reality events of different scales usually occur simultaneously, namely, they span different intervals in time and space. In this paper, we propose a novel approach towards multiscale event detection using social media data, which takes into account different temporal and spatial scales of events in the data. Specifically, we explore the properties of the wavelet transform, which is a well-developed multiscale transform in signal processing, to enable automatic handling of the interaction between temporal and spatial scales. We then propose a novel algorithm to compute a data similarity graph at appropriate scales and detect events of different scales simultaneously by a single graph-based clustering process. Furthermore, we present spatiotemporal statistical analysis of the noisy information present in the data stream, which allows us to define a novel term-filtering procedure for the proposed event detection algorithm and helps us study its behavior using simulated noisy data. Experimental results on both synthetically generated data and real world data collected from Twitter demonstrate the meaningfulness and effectiveness of the proposed approach. Our framework further extends to numerous application domains that involve multiscale and multiresolution data analysis.

artificial intelligence, machine learning, tweet, (16 more...)

arXiv.org Machine Learning

doi: 10.1007/s10618-015-0421-2

1404.7048

Country:

North America > United States > New York > New York County > New York City (0.29)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(23 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Information Technology > Services (0.94)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Z.til

AI ClassicsJan-25-2015, 22:18:33 GMT

This paper describes some work on automatically generating finite counterexamples in topology, and the use of counterexamples to speed up proof discovery in intermediate analysis, and gives some examples theorems where human provers are aided in proof discovery by the use of examples.

relx group plc, totalenergies se, united nations, (135 more...)

AI Classics

Country:

Asia > Middle East (1.00)
Europe > United Kingdom > England (1.00)
Europe > Germany (0.92)
(3 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (1.00)
Instructional Material > Course Syllabus & Notes (1.00)
(2 more...)

Industry:

Leisure & Entertainment > Games (1.01)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Government > Regional Government > > > > > > > North America Government (1.00)
(14 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.03)
Information Technology > Knowledge Management > Knowledge Engineering (1.02)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.02)
(21 more...)

Add feedback

Revealing conceptual structure in data by inductive inference

AI ClassicsJan-25-2015, 22:17:51 GMT

In many applied sciences there is often a problem of revealing a structure underlying a given collection of objects (situations, measurements, observations, etc.). A specific problem of this type is that of determining a hierarchy of meaningful subcategories in such a collection. This problem has been studied intensively in the area of cluster analysis. The methods developed there, however, formulate subcategories ('clusters') solely on the basis of pairwise'similarity' (or'proximity') of objects, and ignore the issue of the'meaning' of the clusters obtained. The methods do not provide any description of the clusters obtained. This paper presents a method which constructs a hierarchy of subcategories, such that an appropriately generalized description of each subcategory is a single conjunctive statement involving attributes of objects and has a simple conceptual interpretation. The attributes may be many-valued nominal variables or relations on numerical variables. The hierarchy is constructed in such a way that a flexibly defined'cost' of the collection of descriptions which branch from any node is minimized. Experiments with the implemented program, CLUSTER/paf, have shown that for some quite simple problems the traditional methods are unable to produce a structuring of objects most'natural' for people, while the method presented here was able to produce such a solution.

carnegie mellon university, machine learning, natural language, (20 more...)

AI Classics

Country: North America > United States > Illinois (0.29)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback

Noisy Sparse Subspace Clustering

Wang, Yu-Xiang, Xu, Huan

arXiv.org Machine LearningJan-22-2015

This paper considers the problem of subspace clustering under noise. Specifically, we study the behavior of Sparse Subspace Clustering (SSC) when either adversarial or random noise is added to the unlabelled input data points, which are assumed to be in a union of low-dimensional subspaces. We show that a modified version of SSC is \emph{provably effective} in correctly identifying the underlying subspaces, even with noisy data. This extends theoretical guarantee of this algorithm to more practical settings and provides justification to the success of SSC in a class of real applications.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Machine Learning

1309.1233

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback