Clustering
Graph Degree Linkage: Agglomerative Clustering on a Directed Graph
Zhang, Wei, Wang, Xiaogang, Zhao, Deli, Tang, Xiaoou
This paper proposes a simple but effective graph-based agglomerative algorithm, for clustering high-dimensional data. We explore the different roles of two fundamental concepts in graph theory, indegree and outdegree, in the context of clustering. The average indegree reflects the density near a sample, and the average outdegree characterizes the local geometry around a sample. Based on such insights, we define the affinity measure of clusters via the product of average indegree and average outdegree. The product-based affinity makes our algorithm robust to noise. The algorithm has three main advantages: good performance, easy implementation, and high computational efficiency. We test the algorithm on two fundamental computer vision problems: image clustering and object matching. Extensive experiments demonstrate that it outperforms the state-of-the-arts in both applications.
Semi-supervised Clustering Ensemble by Voting
Iqbal, Ashraf Mohammed, Moh'd, Abidalrahman, Khan, Zahoor
Clustering ensemble is one of the most recent advances in unsupervised learning. It aims to combine the clustering results obtained using different algorithms or from different runs of the same clustering algorithm for the same data set, this is accomplished using on a consensus function, the efficiency and accuracy of this method has been proven in many works in literature. In the first part of this paper we make a comparison among current approaches to clustering ensemble in literature. All of these approaches consist of two main steps: the ensemble generation and consensus function. In the second part of the paper, we suggest engaging supervision in the clustering ensemble procedure to get more enhancements on the clustering results. Supervision can be applied in two places: either by using semi-supervised algorithms in the clustering ensemble generation step or in the form of a feedback used by the consensus function stage. Also, we introduce a flexible two parameter weighting mechanism, the first parameter describes the compatibility between the datasets under study and the semi-supervised clustering algorithms used to generate the base partitions, the second parameter is used to provide the user feedback on the these partitions. The two parameters are engaged in a "relabeling and voting" based consensus function to produce the final clustering.
Achieving Approximate Soft Clustering in Data Streams
Aggarwal, Vaneet, Krishnan, Shankar
In recent years, data streaming has gained prominence due to advances in technologies that enable many applications to generate continuous flows of data. This increases the need to develop algorithms that are able to efficiently process data streams. Additionally, real-time requirements and evolving nature of data streams make stream mining problems, including clustering, challenging research problems. In this paper, we propose a one-pass streaming soft clustering (membership of a point in a cluster is described by a distribution) algorithm which approximates the "soft" version of the k-means objective function. Soft clustering has applications in various aspects of databases and machine learning including density estimation and learning mixture models. We first achieve a simple pseudo-approximation in terms of the "hard" k-means algorithm, where the algorithm is allowed to output more than $k$ centers. We convert this batch algorithm to a streaming one (using an extension of the k-means++ algorithm recently proposed) in the "cash register" model. We also extend this algorithm when the clustering is done over a moving window in the data stream.
Capturing Browsing Interests of Users into Web Usage Profiles
Kabir, Shaily (Concordia University) | Mudur, Sudhir P. (Concordia University) | Shiri, Nematollaah (Concordia University)
We present a new weighted session similarity measure to capture the browsing interests of users in web usage profiles discovered from web log data. We base our similarity measure on the reasonable assumption that when users spend longer times on pages or revisit pages in the same session, then very likely, such pages are of greater interest to the user. The proposed similarity measure combines structural similarity with session-wise page significance. The latter, representing the degree of user interest, is computed using frequency and duration of a page access. Web usage profiles are generated using this similarity measure by applying a fuzzy clustering algorithm to web log data. For evaluating the effectiveness of the proposed measure, we adapt two model-based collaborative filtering algorithms for recommending pages. Experimental results show considerable improvement in overall performance of recommender systems as compared to use of other existing similarity measures.
Resource Management for Public Sensing
Herrmann, Klaus (University of Stuttgart) | Fischer, Daniel (University of Stuttgart) | Philipp, Damian (University of Stuttgart)
Public sensing is a new research area in the fields of wireless sensor networks and mobile computing. It leverages the mobile sensors and system resources readily available in mobile phones to execute sensing tasks. In order to plan, execute and adapt large-scale sensing tasks, applications need to query for the available resources, e.g. the density of certain sensors. We investigate how such information can be provided, and we propose a resource manager for public sensing. Our primary goal is to minimize the energy consumed by the mobile devices to make public sensing feasible without disturbing users. We propose a cluster-based protocol for collecting local views of the resource state using local ad-hoc communication since this is much more energy-efficient than long-range (e.g. cellular) communication. We compare our solution to a standard approach where mobile devices communicate their resource states using the cellular phone network. We show that 65% of the energy is saved and the communication load on the infrastructure is reduced by 90% while an average delivery ratio of 93% is retained.
Bayesian Unification of Sound Source Localization and Separation with Permutation Resolution
Otsuka, Takuma (Kyoto University) | Ishiguro, Katsuhiko (NTT Corporation) | Sawada, Hiroshi (NTT Corporation) | Okuno, Hiroshi G. (Kyoto University)
Sound source localization and separation with permutation resolution are essential for achieving a computational auditory scene analysis system that can extract useful information from a mixture of various sounds. Because existing methods cope separately with these problems despite their mutual dependence, the overall result with these approaches can be degraded by any failure in one of these components. This paper presents a unified Bayesian framework to solve these problems simultaneously where localization and separation are regarded as a clustering problem. Experimental results confirm that our method outperforms state-of-the-art methods in terms of the separation quality with various setups including practical reverberant environments.
Crowdclustering with Sparse Pairwise Labels: A Matrix Completion Approach
Yi, Jinfeng (Michigan State University) | Jin, Rong (Michigan State University) | Jain, Anil (Michigan State University) | Jain, Shaili (Yale University)
Crowdsourcing utilizes human ability by distributing tasks to a large number of workers. It is especially suitable for solving data clustering problems because it provides a way to obtain a similarity measure between objects based on manual annotations, which capture the human perception of similarity among objects.This is in contrast to most clustering algorithms that face the challenge of finding an appropriate similarity measure for the given dataset. Several algorithms have been developed for crowdclustering that combine partial clustering results, each obtained by annotations provided by a different worker, into a single data partition. However, existing crowd-clustering approaches require a large number of annotations, due to the noisy nature of human annotations, leading to a high computational cost in addition to the large cost associated with annotation. We address this problem by developing a novel approach for crowclustering that exploits the technique of matrix completion. Instead of using all the annotations, the proposed algorithm constructs a partially observed similarity matrix based on a subset of pairwise annotation labels that are agreed upon by most annotators. It then deploys the matrix completion algorithm to complete the similarity matrix and obtains the final data partition by applying a spectral clustering algorithm to the completed similarity matrix. We show, both theoretically and empirically, that the proposed approach needs only a small number of manual annotations to obtain an accurate data partition. In effect, we highlight the trade-off between a large number of noisy crowdsourced labels and a small number of high quality labels.
Towards Action Representation within the Framework of Conceptual Spaces: Preliminary Results
Beyer, Oliver (CITEC Bielefeld University) | Griffiths, Sascha (CITEC, Bielefeld University) | Cimiano, Philipp (CITEC, Bielefeld University)
We propose an approach for the representation of actions based on the conceptual spaces framework developed by Gรคrdenfors (2004). Action categories are regarded as properties in the sense of Gรคrdenfors (2011) and are understood as convex regions in action space. Action categories are mainly described by a force signature that represents the forces that act upon a main trajector involved in the action. This force signature is approximated via a representation that specifies the time-indexed position of the trajector relative to several landmarks. We also present a computational approach to extract such representations from video data. We present results on the Motionese dataset consisting of videos of parents demonstrating actions on objects to their children. We evaluate the representations on a clustering and a classification task showing that, while our representations seems to be reasonable, only a handful of actions can be discriminated reliably.
Parsing Outdoor Scenes from Streamed 3D Laser Data Using Online Clustering and Incremental Belief Updates
Triebel, Rudolph A. (University of Oxford) | Paul, Rohan (University of Oxford) | Rus, Daniela (Massachusetts Institute of Technology) | Newman, Paul (University of Oxford)
In this paper, we address the problem of continually parsing a stream of 3D point cloud data acquired from a laser sensor mounted on a road vehicle. We leverage an online star clustering algorithm coupled with an incremental belief update in an evolving undirected graphical model. The fusion of these techniques allows the robot to parse streamed data and to continually improve its understanding of the world. The core competency produced is an ability to infer object classes from similarities based on appearance and shape features, and to concurrently combine that with a spatial smoothing algorithm incorporating geometric consistency. This formulation of feature-space star clustering modulating the potentials of a spatial graphical model is entirely novel. In our method, the two sources of information: feature similarity and geometrical consistency are fed continu- ally into the system, improving the belief over the class distributions as new data arrives. The algorithm obviates the need for hand-labeled training data and makes no apriori assumptions on the number or characteristics of object categories. Rather, they are learnt incrementally over time from streamed input data. In experiments per- formed on real 3D laser data from an outdoor scene, we show that our approach is capable of obtaining an ever- improving unsupervised scene categorization.
Repeated Sequential Auctions with Dynamic Task Clusters
Heap, Bradford Gregory John (University of New South Wales) | Pagnucco, Maurice (University of New South Wales)
Sequential auctions can be used to provide solutions to the multi-robot task-allocation problem. In this paper we extend previous work on sequential auctions and propose an algorithm that clusters and auctions uninitiated task clusters repeatedly upon the completion of individual tasks. We demonstrate empirically that our algorithm results in lower overall team costs than other sequential auction algorithms that only assign tasks once.