Goto

Collaborating Authors

 Country


Selective Transfer Between Learning Tasks Using Task-Based Boosting

AAAI Conferences

The success of transfer learning on a target task is highly dependent on the selected source data. Instance transfer methods reuse data from the source tasks to augment the training data for the target task. If poorly chosen, this source data may inhibit learning, resulting in negative transfer. The current most widely used algorithm for instance transfer, TrAdaBoost, performs poorly when given irrelevant source data. We present a novel task-based boosting technique for instance transfer that selectively chooses the source knowledge to transfer to the target task. Our approach performs boosting at both the instance level and the task level, assigning higher weight to those source tasks that show positive transferability to the target task, and adjusting the weights of individual instances within each source task via AdaBoost. We show that this combination of task- and instance-level boosting significantly improves transfer performance over existing instance transfer algorithms when given a mix of relevant and irrelevant source data, especially for small amounts of data on the target task.


Symmetric Graph Regularized Constraint Propagation

AAAI Conferences

This paper presents a novel symmetric graph regularization framework for pairwise constraint propagation. We first decompose the challenging problem of pairwise constraint propagation into a series of two-class label propagation subproblems and then deal with these subproblems by quadratic optimization with symmetric graph regularization. More importantly, we clearly show that pairwise constraint propagation is actually equivalent to solving a Lyapunov matrix equation, which is widely used in Control Theory as a standard continuous-time equation. Different from most previous constraint propagation methods that suffer from severe limitations, our method can directly be applied to multi-class problem and also can effectively exploit both must-link and cannot-link constraints. The propagated constraints are further used to adjust the similarity between data points so that they can be incorporated into subsequent clustering. The proposed method has been tested in clustering tasks on six real-life data sets and then shown to achieve significant improvements with respect to the state of the arts.


Basis Function Discovery Using Spectral Clustering and Bisimulation Metrics

AAAI Conferences

We study the problem of automatically generating features for function approximation in reinforcement learning. We build on the work of Mahadevan and his colleagues, who pioneered the use of spectral clustering methods for basis function construction. Their methods work on top of a graph that captures state adjacency. Instead, we use bisimulation metrics in order to provide state distances for spectral clustering. The advantage of these metrics is that they incorporate reward information in a natural way, in addition to the state transition information. We provide theoretical bounds on the quality of the obtained approximation, which justify the importance of incorporating reward information. We also demonstrate empirically that the approximation quality improves when bisimulation metrics are used instead of the state adjacency graph in the basis function construction process.


Across-Model Collective Ensemble Classification

AAAI Conferences

Ensemble classification methods that independently construct component models (e.g., bagging) improve accuracy over single models by reducing the error due to variance. Some work has been done to extend ensemble techniques for classification in relational domains by taking relational data characteristics or multiple link types into account during model construction. However, since these approaches follow the conventional approach to ensemble learning, they improve performance by reducing the error due to variance in learning. We note however, that variance in inference can be an additional source of error in relational methods that use collective classification, since inferred values are propagated during inference. We propose a novel ensemble mechanism for collective classification that reduces  both learning and inference variance, by incorporating prediction averaging into the collective inference process itself. We show that our proposed method significantly outperforms a straightforward relational ensemble baseline on both synthetic and real-world datasets.


Unsupervised Learning of Human Behaviours

AAAI Conferences

Behaviour recognition is the process of inferring the behaviour of an individual from a series of observations acquired from sensors such as in a smart home. The majority of existing behaviour recognition systems are based on supervised learning algorithms, which means that training them requires a preprocessed, annotated dataset. Unfortunately, annotating a dataset is a rather tedious process and one that is prone to error. In this paper we suggest a way to identify structure in the data based on text compression and the edit distance between words, without any prior labelling. We demonstrate that by using this method we can automatically identify patterns and segment the data into patterns that correspond to human behaviours. To evaluate the effectiveness of our proposed method, we use a dataset from a smart home and compare the labels produced by our approach with the labels assigned by a human to the activities in the dataset. We find that the results are promising and show significant improvement in the recognition accuracy over Self-Organising Maps (SOMs).


Large Scale Spectral Clustering with Landmark-Based Representation

AAAI Conferences

Spectral clustering is one of the most popular clustering approaches. Despite its good performance, it is limited in its applicability to large-scale problems due to its high computational complexity. Recently, many approaches have been proposed to accelerate the spectral clustering. Unfortunately, these methods usually sacrifice quite a lot information of the original data, thus result in a degradation of performance. In this paper, we propose a novel approach, called Landmark-based Spectral Clustering (LSC), for large scale clustering problems. Specifically, we select $p\ (\ll n)$ representative data points as the landmarks and represent the original data points as the linear combinations of these landmarks. The spectral embedding of the data can then be efficiently computed with the landmark-based representation. The proposed algorithm scales linearly with the problem size. Extensive experiments show the effectiveness and efficiency of our approach comparing to the state-of-the-art methods.


A Nonparametric Bayesian Model of Multi-Level Category Learning

AAAI Conferences

Categories are often organized into hierarchical taxonomies, that is, tree structures where each node represents a labeled category, and a node's parent and children are, respectively, the category's supertype and subtypes. A natural question is whether it is possible to reconstruct category taxonomies in cases where we are not given explicit information about how categories are related to each other, but only a sample of observations of the members of each category. In this paper, we introduce a nonparametric Bayesian model of multi-level category learning, an extension of the hierarchical Dirichlet process (HDP) that we call the tree-HDP. We demonstrate the ability of the tree-HDP to reconstruct simulated datasets of artificial taxonomies, and show that it produces similar performance to human learners on a taxonomy inference task.


Learning Structured Embeddings of Knowledge Bases

AAAI Conferences

Many Knowledge Bases (KBs) are now readily available and encompass colossal quantities of information thanks to either a long-term funding effort (e.g. WordNet, OpenCyc) or a collaborative process (e.g. Freebase, DBpedia). However, each of them is based on a different rigorous symbolic framework which makes it hard to use their data in other systems. It is unfortunate because such rich structured knowledge might lead to a huge leap forward in many other areas of AI like nat- ural language processing (word-sense disambiguation, natural language understanding, ...), vision (scene classification, image semantic annotation, ...) or collaborative filtering. In this paper, we present a learning process based on an innovative neural network architecture designed to embed any of these symbolic representations into a more flexible continuous vector space in which the original knowledge is kept and enhanced. These learnt embeddings would allow data from any KB to be easily used in recent machine learning meth- ods for prediction and information retrieval. We illustrate our method on WordNet and Freebase and also present a way to adapt it to knowledge extraction from raw text.


An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems

AAAI Conferences

Recently, a number of researchers have proposed spectral algorithms for learning models of dynamical systems — for example, Hidden Markov Models (HMMs), Partially Observable Markov Decision Processes (POMDPs), and Transformed Predictive State Representations (TPSRs). These algorithms are attractive since they are statistically consistent and not subject to local optima. However, they are batch methods: they need to store their entire training data set in memory at once and operate on it as a large matrix, and so they cannot scale to extremely large data sets (either many examples or many features per example). In turn, this restriction limits their ability to learn accurate models of complex systems. To overcome these limitations, we propose a new online spectral algorithm, which uses tricks such as incremental Singular Value Decomposition (SVD) and random projections to scale to much larger data sets and more complex systems than previous methods. We demonstrate the new method on an inertial measurement prediction task and a high-bandwidth video mapping task and we illustrate desirable behaviors such as "closing the loop," where the latent state representation changes suddenly as the learner recognizes that it has returned to a previously known place.


Bounded Forgetting

AAAI Conferences

The result of forgetting some predicates in a first-order sentence may not exist in the sense that it might not be captured by any first-order sentences. This, indeed, severely restricts the usage of forgetting in applications. To address this issue, we propose a notion called $k$-forgetting, also called bounded forgetting in general, for any fixed number $k$. We present several equivalent characterizations of bounded forgetting and show that the result of bounded forgetting, on one hand, can always be captured by a single first-order sentence, and on the other hand, preserves the information that we are concerned with.