Goto

Collaborating Authors

 Clustering


A Unified Framework for Planning and Execution-Monitoring of Mobile Robots

AAAI Conferences

We present an original integration of high level planning and execution with incoming perceptual information from vision, SLAM, topological map segmentation and dialogue. The task of the robot system, implementing the integrated model, is to explore unknown areas and report detected objects to an operator, by speaking loudly. The knowledge base of the planner maintains a graph-based representation of the metric map that is dynamically constructed via an unsupervised topological segmentation method, and augmented with information about the type and position of detected objects, within the map, such as cars or containers. According to this knowledge the cognitive robot can infer strategies in so generating parametric plans that are instantiated from the perceptual processes. Finally, a model-based approach for the execution and control of the robot system is proposed to monitor, concurrently, the low level status of the system and the execution of the activities, in order to achieve the goal, instructed by the operator.


Energy Outlier Detection in Smart Environments

AAAI Conferences

Despite a dramatic growth of power consumption inhouseholds, less attention has been paid to monitoring,analyzing and predicting energy usage. In this paper,we propose a framework to mine raw energy data bytransforming time series energy data into a symbol se-quence, and then extend a suffix tree data structure asan efficient representation to analyze global structuralpatterns. Then, we use a clustering algorithm to detectenergy pattern outliers which are far from their clustercentroids. To validate our approach, we use real powerdata collected from a smart apartment testbed duringtwo months.


Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery

AAAI Conferences

Skill discovery algorithms in reinforcement learning typically identify single states or regions in state space that correspond to potential task-specific subgoals. However, such methods do not directly address the question of how many distinct skills are appropriate for solving the tasks that the agent faces. This can be highly inefficient when many identified subgoals correspond to the same underlying skill, but are all used in- dividually as skill goals. Furthermore, skills created in this manner are often only transferable to tasks that share iden- tical state spaces, since corresponding subgoals across tasks are not merged into a single skill goal. We show that these problems can be overcome by clustering subgoal data defined in an agent-space and using the resulting clusters as templates for skill termination conditions. Clustering via a Dirichlet process mixture model is used to discover a minimal, suffi- cient collection of portable skills.


Learning a Kernel for Multi-Task Clustering

AAAI Conferences

Multi-task learning has received increasing attention in the past decade. Many supervised multi-task learning methods have been proposed, while unsupervised multi-task learning is still a rarely studied problem. In this paper, we propose to learn a kernel for multi-task clustering. Our goal is to learn a Reproducing Kernel Hilbert Space, in which the geometric structure of the data in each task is preserved, while the data distributions of any two tasks are as close as possible. This is formulated as a unified kernel learning framework, under which we study two types of kernel learning: nonparametric kernel learning and spectral kernel design. Both types of kernel learning can be solved by linear programming. Experiments on several cross-domain text data sets demonstrate that kernel k-means on the learned kernel can achieve better clustering results than traditional single-task clustering methods. It also outperforms the newly proposed multi-task clustering method.


Multi-Level Cluster Indicator Decompositions of Matrices and Tensors

AAAI Conferences

A main challenging problem for many machine learning and data mining applications is that the amount of data and features are very large, so that low-rank approximations of original data are often required for efficient computation. We propose new multi-level clustering based low-rank matrix approximations which are comparable and even more compact than Singular Value Decomposition (SVD). We utilize the cluster indicators of data clustering results to form the subspaces, hence our decomposition results are more interpretable. We further generalize our clustering based matrix decompositions to tensor decompositions that are useful in high-order data analysis. We also provide an upper bound for the approximation error of our tensor decomposition algorithm. In all experimental results, our methods significantly outperform traditional decomposition methods such as SVD and high-order SVD.


Discovering Latent Strategies

AAAI Conferences

Strategy mining is a new area of research about discovering strategies in decision-making. In this paper, we formulate the strategy-mining problem as a clustering problem, called the latent-strategy problem. In a latent-strategy problem, a corpus of data instances is given, each of which is represented by a set of features and a decision label. The inherent dependency of the decision label on the features is governed by a latent strategy. The objective is to find clusters, each of which contains data instances governed by the same strategy. Existing clustering algorithms are inappropriate to cluster dependency because they either assume feature independency (e.g., K-means) or only consider the co-occurrence of features without explicitly modeling the special dependency of the decision label on other features (e.g., Latent Dirichlet Allocation (LDA)). In this paper, we present a baseline unsupervised learning algorithm for dependency clustering. Our model-based clustering algorithm iterates between an assignment step and a minimization step to learn a mixture of decision tree models that represent latent strategies. Similar to the Expectation Maximization algorithm, our algorithm is grounded in the statistical learning theory. Different from other clustering algorithms, our algorithm is irrelevant-feature resistant and its learned clusters (modeled by decision trees) are strongly interpretable and predictive. We systematically evaluate our algorithm using a common law dataset comprised of actual cases. Experimental results show that our algorithm significantly outperforms K-means and LDA on clustering dependency.


SemRec: A Semantic Enhancement Framework for Tag Based Recommendation

AAAI Conferences

Collaborative tagging services provided by various social web sites become popular means to mark web resources for different purposes such as categorization, expression of a preference and so on. However, the tags are of syntactic nature, in a free style and do not reflect semantics, resulting in the problems of redundancy, ambiguity and less semantics. Current tag-based recommender systems mainly take the explicit structural information among users, resources and tags into consideration, while neglecting the important implicit semantic relationships hidden in tagging data. In this study, we propose a Semantic Enhancement Recommendation strategy (SemRec), based on both structural information and semantic information through a unified fusion model. Extensive experiments conducted on two real datasets demonstarte the effectiveness of our approaches.


Cross-Language Latent Relational Search: Mapping Knowledge across Languages

AAAI Conferences

Latent relational search (LRS) is a novel approach for mapping knowledge across two domains. Given a source domain knowledge concerning the Moon, "The Moon is a satellite of the Earth," one can form a question {(Moon, Earth), (Ganymede, ?)} to query an LRS engine for new knowledge in the target domain concerning the Ganymede. An LRS engine relies on some supporting sentences such as ``Ganymede is a natural satellite of Jupiter.'' to retrieve and rank "Jupiter" as the first answer. This paper proposes cross-language latent relational search (CLRS) to extend the knowledge mapping capability of LRS from cross-domain knowledge mapping to cross-domain and cross-language knowledge mapping. In CLRS, the supporting sentences for the source pair might be in a different language with that of the target pair. We represent the relation between two entities in an entity pair by lexical patterns of the context surrounding the two entities. We then propose a novel hybrid lexical pattern clustering algorithm to capture the semantic similarity between paraphrased lexical patterns across languages. Experiments on Japanese-English datasets show that the proposed method achieves an MRR of 0.579 for CLRS task, which is comparable to the MRR of an existing monolingual LRS engine.


Identifying Missing Node Information in Social Networks

AAAI Conferences

In recent years, social networks have surged in popularity as one of the main applications of the Internet. This has generated great interest in researching these networks by various fields in the scientific community. One key aspect of social network research is identifying important missing information which is not explicitly represented in the network, or is not visible to all. To date, this line of research typically focused on what connections were missing between nodes,or what is termed the "Missing Link Problem." This paper introduces a new Missing Nodes Identification problem where missing members in the social network structure must be identified. Towards solving this problem, we present an approach based on clustering algorithms combined with measures from missing link research. We show that this approach has beneficial results in the missing nodes identification process and we measure its performance in several different scenarios.


Exploiting Phase Transition in Latent Networks for Clustering

AAAI Conferences

In this paper, we model the pair-wise similarities of a setof documents as a weighted network with a single cutoffparameter. Such a network can be thought of an ensemble of unweighted graphs, each consisting of edges withweights greater than the cutoff value. We look at this network ensemble as a complex system with a temperature parameter, and refer to it as a Latent Network. Ourexperiments on a number of datasets from two different domains show that certain properties of latent networks like clustering coefficient, average shortest path,and connected components exhibit patterns that are significantly divergent from randomized networks. We explain that these patterns reflect the network phase transition as well as the existence of a community structure in document collections. Using numerical analysis,we show that we can use the aforementioned networkproperties to predicts the clustering Normalized MutualInformation (NMI) with high correlation (rho > 0.9). Finally we show that our clustering method significantlyoutperforms other baseline methods (NMI > 0.5)