AITopics | Clustering

Collaborating Authors

Clustering

Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters). It is a main task of exploratory data mining, and a common technique for statistical data analysis, used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, bioinformatics, data compression, and computer graphics. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Graph Degree Linkage: Agglomerative Clustering on a Directed Graph

Zhang, Wei, Wang, Xiaogang, Zhao, Deli, Tang, Xiaoou

arXiv.org Machine LearningAug-24-2012

This paper proposes a simple but effective graph-based agglomerative algorithm, for clustering high-dimensional data. We explore the different roles of two fundamental concepts in graph theory, indegree and outdegree, in the context of clustering. The average indegree reflects the density near a sample, and the average outdegree characterizes the local geometry around a sample. Based on such insights, we define the affinity measure of clusters via the product of average indegree and average outdegree. The product-based affinity makes our algorithm robust to noise. The algorithm has three main advantages: good performance, easy implementation, and high computational efficiency. We test the algorithm on two fundamental computer vision problems: image clustering and object matching. Extensive experiments demonstrate that it outperforms the state-of-the-arts in both applications.

affinity, algorithm, outdegree, (14 more...)

arXiv.org Machine Learning

1208.5092

Country:

Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)

Add feedback

Semi-supervised Clustering Ensemble by Voting

Iqbal, Ashraf Mohammed, Moh'd, Abidalrahman, Khan, Zahoor

arXiv.org Machine LearningAug-20-2012

Clustering ensemble is one of the most recent advances in unsupervised learning. It aims to combine the clustering results obtained using different algorithms or from different runs of the same clustering algorithm for the same data set, this is accomplished using on a consensus function, the efficiency and accuracy of this method has been proven in many works in literature. In the first part of this paper we make a comparison among current approaches to clustering ensemble in literature. All of these approaches consist of two main steps: the ensemble generation and consensus function. In the second part of the paper, we suggest engaging supervision in the clustering ensemble procedure to get more enhancements on the clustering results. Supervision can be applied in two places: either by using semi-supervised algorithms in the clustering ensemble generation step or in the form of a feedback used by the consensus function stage. Also, we introduce a flexible two parameter weighting mechanism, the first parameter describes the compatibility between the datasets under study and the semi-supervised clustering algorithms used to generate the base partitions, the second parameter is used to provide the user feedback on the these partitions. The two parameters are engaged in a "relabeling and voting" based consensus function to produce the final clustering.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

1208.4138

Country: North America > Canada (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.73)

Add feedback

Achieving Approximate Soft Clustering in Data Streams

Aggarwal, Vaneet, Krishnan, Shankar

arXiv.org Artificial IntelligenceJul-26-2012

In recent years, data streaming has gained prominence due to advances in technologies that enable many applications to generate continuous flows of data. This increases the need to develop algorithms that are able to efficiently process data streams. Additionally, real-time requirements and evolving nature of data streams make stream mining problems, including clustering, challenging research problems. In this paper, we propose a one-pass streaming soft clustering (membership of a point in a cluster is described by a distribution) algorithm which approximates the "soft" version of the k-means objective function. Soft clustering has applications in various aspects of databases and machine learning including density estimation and learning mixture models. We first achieve a simple pseudo-approximation in terms of the "hard" k-means algorithm, where the algorithm is allowed to output more than $k$ centers. We convert this batch algorithm to a streaming one (using an extension of the k-means++ algorithm recently proposed) in the "cash register" model. We also extend this algorithm when the clustering is done over a moving window in the data stream.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1207.6199

Country: North America > United States (0.69)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Capturing Browsing Interests of Users into Web Usage Profiles

Kabir, Shaily (Concordia University) | Mudur, Sudhir P. (Concordia University) | Shiri, Nematollaah (Concordia University)

AAAI ConferencesJul-21-2012

We present a new weighted session similarity measure to capture the browsing interests of users in web usage profiles discovered from web log data. We base our similarity measure on the reasonable assumption that when users spend longer times on pages or revisit pages in the same session, then very likely, such pages are of greater interest to the user. The proposed similarity measure combines structural similarity with session-wise page significance. The latter, representing the degree of user interest, is computed using frequency and duration of a page access. Web usage profiles are generated using this similarity measure by applying a fuzzy clustering algorithm to web log data. For evaluating the effectiveness of the proposed measure, we adapt two model-based collaborative filtering algorithms for recommending pages. Experimental results show considerable improvement in overall performance of recommender systems as compared to use of other existing similarity measures.

AAAI Conferences

Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Resource Management for Public Sensing

Herrmann, Klaus (University of Stuttgart) | Fischer, Daniel (University of Stuttgart) | Philipp, Damian (University of Stuttgart)

AAAI ConferencesJul-21-2012

Public sensing is a new research area in the fields of wireless sensor networks and mobile computing. It leverages the mobile sensors and system resources readily available in mobile phones to execute sensing tasks. In order to plan, execute and adapt large-scale sensing tasks, applications need to query for the available resources, e.g. the density of certain sensors. We investigate how such information can be provided, and we propose a resource manager for public sensing. Our primary goal is to minimize the energy consumed by the mobile devices to make public sensing feasible without disturbing users. We propose a cluster-based protocol for collecting local views of the resource state using local ad-hoc communication since this is much more energy-efficient than long-range (e.g. cellular) communication. We compare our solution to a standard approach where mobile devices communicate their resource states using the cellular phone network. We show that 65% of the energy is saved and the communication load on the infrastructure is reduced by 90% while an average delivery ratio of 93% is retained.

communication, node, resource state, (16 more...)

AAAI Conferences

Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > District of Columbia > Washington (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Industry: Telecommunications (0.48)

Technology:

Information Technology > Communications > Networks > Sensor Networks (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Bayesian Unification of Sound Source Localization and Separation with Permutation Resolution

Otsuka, Takuma (Kyoto University) | Ishiguro, Katsuhiko (NTT Corporation) | Sawada, Hiroshi (NTT Corporation) | Okuno, Hiroshi G. (Kyoto University)

AAAI ConferencesJul-21-2012

Sound source localization and separation with permutation resolution are essential for achieving a computational auditory scene analysis system that can extract useful information from a mixture of various sounds. Because existing methods cope separately with these problems despite their mutual dependence, the overall result with these approaches can be degraded by any failure in one of these components. This paper presents a unified Bayesian framework to solve these problems simultaneously where localization and separation are regarded as a clustering problem. Experimental results confirm that our method outperforms state-of-the-art methods in terms of the separation quality with various setups including practical reverberant environments.

localization, resolution, separation, (12 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.05)
Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach

Yi, Jinfeng (Michigan State University) | Jin, Rong (Michigan State University) | Jain, Anil (Michigan State University) | Jain, Shaili (Yale University)

AAAI ConferencesJul-21-2012

Crowdsourcing utilizes human ability by distributing tasks to a large number of workers. It is especially suitable for solving data clustering problems because it provides a way to obtain a similarity measure between objects based on manual annotations, which capture the human perception of similarity among objects.This is in contrast to most clustering algorithms that face the challenge of finding an appropriate similarity measure for the given dataset. Several algorithms have been developed for crowdclustering that combine partial clustering results, each obtained by annotations provided by a different worker, into a single data partition. However, existing crowd-clustering approaches require a large number of annotations, due to the noisy nature of human annotations, leading to a high computational cost in addition to the large cost associated with annotation. We address this problem by developing a novel approach for crowclustering that exploits the technique of matrix completion. Instead of using all the annotations, the proposed algorithm constructs a partially observed similarity matrix based on a subset of pairwise annotation labels that are agreed upon by most annotators. It then deploys the matrix completion algorithm to complete the similarity matrix and obtains the final data partition by applying a spectral clustering algorithm to the completed similarity matrix. We show, both theoretically and empirically, that the proposed approach needs only a small number of manual annotations to obtain an accurate data partition. In effect, we highlight the trade-off between a large number of noisy crowdsourced labels and a small number of high quality labels.

algorithm, artificial intelligence, machine learning, (16 more...)

AAAI Conferences

Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Michigan > Ingham County > Lansing (0.04)
North America > United States > Michigan > Ingham County > East Lansing (0.04)
(2 more...)

Genre: Research Report (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Towards Action Representation within the Framework of Conceptual Spaces: Preliminary Results

Beyer, Oliver (CITEC Bielefeld University) | Griffiths, Sascha (CITEC, Bielefeld University) | Cimiano, Philipp (CITEC, Bielefeld University)

AAAI ConferencesJul-21-2012

We propose an approach for the representation of actions based on the conceptual spaces framework developed by Gärdenfors (2004). Action categories are regarded as properties in the sense of Gärdenfors (2011) and are understood as convex regions in action space. Action categories are mainly described by a force signature that represents the forces that act upon a main trajector involved in the action. This force signature is approximated via a representation that specifies the time-indexed position of the trajector relative to several landmarks. We also present a computational approach to extract such representations from video data. We present results on the Motionese dataset consisting of videos of parents demonstrating actions on objects to their children. We evaluate the representations on a clustering and a classification task showing that, while our representations seems to be reasonable, only a handful of actions can be discriminated reliably.

artificial intelligence, machine learning, representation, (14 more...)

AAAI Conferences

Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Parsing Outdoor Scenes from Streamed 3D Laser Data Using Online Clustering and Incremental Belief Updates

Triebel, Rudolph A. (University of Oxford) | Paul, Rohan (University of Oxford) | Rus, Daniela (Massachusetts Institute of Technology) | Newman, Paul (University of Oxford)

AAAI ConferencesJul-21-2012

In this paper, we address the problem of continually parsing a stream of 3D point cloud data acquired from a laser sensor mounted on a road vehicle. We leverage an online star clustering algorithm coupled with an incremental belief update in an evolving undirected graphical model. The fusion of these techniques allows the robot to parse streamed data and to continually improve its understanding of the world. The core competency produced is an ability to infer object classes from similarities based on appearance and shape features, and to concurrently combine that with a spatial smoothing algorithm incorporating geometric consistency. This formulation of feature-space star clustering modulating the potentials of a spatial graphical model is entirely novel. In our method, the two sources of information: feature similarity and geometrical consistency are fed continu- ally into the system, improving the belief over the class distributions as new data arrives. The algorithm obviates the need for hand-labeled training data and makes no apriori assumptions on the number or characteristics of object categories. Rather, they are learnt incrementally over time from streamed input data. In experiments per- formed on real 3D laser data from an outdoor scene, we show that our approach is capable of obtaining an ever- improving unsupervised scene categorization.

artificial intelligence, machine learning, natural language, (19 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Automobiles & Trucks (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)

Add feedback

Repeated Sequential Auctions with Dynamic Task Clusters

Heap, Bradford Gregory John (University of New South Wales) | Pagnucco, Maurice (University of New South Wales)

AAAI ConferencesJul-21-2012

Sequential auctions can be used to provide solutions to the multi-robot task-allocation problem. In this paper we extend previous work on sequential auctions and propose an algorithm that clusters and auctions uninitiated task clusters repeatedly upon the completion of individual tasks. We demonstrate empirically that our algorithm results in lower overall team costs than other sequential auction algorithms that only assign tasks once.

artificial intelligence, auction, machine learning, (16 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Africa > Sudan (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback