Goto

Collaborating Authors

 Industry


Multi-Kernel Gaussian Processes

AAAI Conferences

Multi-task learning remains a difficult yet important problem in machine learning. In Gaussian processes the main challenge is the definition of valid kernels (covariance functions) able to capture the relationships between different tasks. This paper presents a novel methodology to construct valid multi-task covariance functions (Mercer kernels) for Gaussian processes allowing for a combination of kernels with different forms. The method is based on Fourier analysis and is general for arbitrary stationary covariance functions. Analytical solutions for cross covariance terms between popular forms are provided including Mat´ern, squared exponential and sparse covariance functions. Experiments are conducted with both artificial and real datasets demonstrating the benefits of the approach.


Agent-Oriented Incremental Team and Activity Recognition

AAAI Conferences

Monitoring team activity is beneficial when human teams cooperate in the enactment of a joint plan. Monitoring allows teams to maintain awareness of each other's progress within the plan and it enables anticipation of information needs. Humans find this difficult, particularly in time-stressed and uncertain environments. In this paper we introduce a probabilistic model, based on Conditional Random Fields, to automatically recognise the composition of teams and the team activities in relation to a plan. The team composition and activities are recognised incrementally by interpreting a stream of spatio-temporal observations.


Activity Recognition with Finite State Machines

AAAI Conferences

This paper shows how to learn general, Finite State Machine representations of activities that function as recognizers of previously unseen instances of activities. The central problem is to tell which differences between instances of activities are unimportant and may be safely ignored for the purpose of learning generalized representations of activities. We develop a novel way to find the "essential parts" of activities by a greedy kind of multiple sequence alignment, and a method to transform the resulting alignments into Finite State Machine that will accept novel instances of activities with high accuracy.


Adaptation of a Mixture of Multivariate Bernoulli Distributions

AAAI Conferences

The mixture of multivariate Bernoulli distributions (MMB) is a statistical model for high-dimensional binary data in widespread use. Recently, the MMB has been used to model the sequence of packet receptions and losses of wireless links in sensor networks. Given an MMB trained on long data traces recorded from links of a deployed network, one can then use samples from the MMB to test different routing algorithms for as long as desired. However, learning an accurate model for a new link requires collecting from it long traces over periods of hours, a costly process in practice (e.g. limited battery life). We propose an algorithm that can adapt a preexisting MMB trained with extensive data to a new link from which very limited data is available. Our approach constrains the new MMB's parameters through a nonlinear transformation of the existing MMB's parameters. The transformation has a small number of parameters that are estimated using a generalized EM algorithm with an inner loop of BFGS iterations. We demonstrate the efficacy of the approach using the MNIST dataset of handwritten digits, and wireless link data from a sensor network. We show we can learn accurate models from data traces of about 1 minute, about 10 times shorter than needed if training an MMB from scratch.


Joint Feature Selection and Subspace Learning

AAAI Conferences

Dimensionality reduction is a very important topic in machine learning. It can be generally classified into two categories: feature selection and subspace learning. In the past decades, many methods have been proposed for dimensionality reduction. However, most of these works study feature selection and subspace learning independently. In this paper, we present a framework for joint feature selection and subspace learning. We reformulate the subspace learning problem and use L {2,1} -norm on the projection matrix to achieve row-sparsity, which leads to selecting relevant features and learning transformation simultaneously. We discuss two situations of the proposed framework, and present their optimization algorithms. Experiments on benchmark face recognition data sets illustrate that the proposed framework outperforms the state of the art methods overwhelmingly.


On Trivial Solution and Scale Transfer Problems in Graph Regularized NMF

AAAI Conferences

Combining graph regularization with nonnegative matrix (tri-)factorization (NMF) has shown great performance improvement compared with traditional nonnegative matrix (tri-)factorization models due to its ability to utilize the geometric structure of the documents and words. In this paper, we show that these models are not well-defined and suffering from trivial solution and scale transfer problems. In order to solve these common problems, we propose two models for graph regularized nonnegative matrix (tri-)factorization, which can be applied for document clustering and co-clustering respectively. In the proposed models, a Normalized Cut-like constraint is imposed on the cluster assignment matrix to make the optimization problem well-defined. We derive a multiplicative updating algorithm for the proposed models, and prove its convergence. Experiments of clustering and co-clustering on benchmark text data sets demonstratethat the proposed models outperform the originalmodels as well as many other state-of-the-art clustering methods.


Continuous Correlated Beta Processes

AAAI Conferences

In this paper we consider a (possibly continuous) space of Bernoulli experiments. We assume that the Bernoulli distributions of the points are correlated. All evidence data comes in the form of successful or failed experiments at different points. Current state-of-the-art methods for expressing a distribution over a continuum of Bernoulli distributions use logistic Gaussian processes or Gaussian copula processes. However, both of these require computationally expensive matrix operations (cubic in the general case). We introduce a more intuitive approach, directly correlating beta distributions by sharing evidence between them according to a kernel function, an approach which has linear time complexity. The approach can easily be extended to multiple outcomes, giving a continuous correlated Dirichlet process.This approach can be used for classification (both binary and multi-class) and learning the actual probabilities of the Bernoulli distributions. We show results for a number of data sets, as well as a case-study where a mixture of continuous beta processes is used as part of an automated stroke rehabilitation system.


Concept Labeling: Building Text Classifiers with Minimal Supervision

AAAI Conferences

The rapid construction of supervised text classification models is becoming a pervasive need across many modern applications. To reduce human-labeling bottlenecks, many new statistical paradigms (e.g., active, semi-supervised, transfer and multi-task learning) have been vigorously pursued in recent literature with varying degrees of empirical success. Concurrently, the emergence of Web 2.0 platforms in the last decade has enabled a world-wide, collaborative human effort to construct a massive ontology of concepts with very rich, detailed and accurate descriptions. In this paper we propose a new framework to extract supervisory information from such ontologies and complement it with a shift in human effort from direct labeling of examples in the domain of interest to the much more efficient identification of concept-class associations. Through empirical studies on text categorization problems using the Wikipedia ontology, we show that this shift allows very high-quality models to be immediately induced at virtually no cost.


Increasing the Scalability of the Fitting of Generalised Block Models for Social Networks

AAAI Conferences

In recent years, the summarisation and decomposition of social networks has become increasingly popular, from community finding to role equivalence. However, these approaches concentrate on one type of model only. Generalised block modelling decomposes a network into independent, interpretable, labeled blocks, where the block labels summarise the relationship between two sets of users. Existing algorithms for fitting generalised block models do not scale beyond networks of 100 vertices. In this paper, we introduce two new algorithms, one based on genetic algorithms and the other on simulated annealing, that is at least two orders of magnitude faster than existing algorithms and obtaining similar accuracy. Using synthetic and real datasets, we demonstrate their efficiency and accuracy and show how generalised block modelling and our new approaches enable tractable network summarisation and modelling of medium sized networks.


Using Cases as Heuristics in Reinforcement Learning: A Transfer Learning Application

AAAI Conferences

Another way to speed up a RL algorithm is by using Transfer Learning, a paradigm of machine learning that In this paper we propose to combine three AI techniques reuses knowledge accumulated in a previous task to speed up to speed up a Reinforcement Learning algorithm the learning of a novel, but related, target task [Taylor and in a Transfer Learning problem: Casebased Stone, 2009]. Reasoning, Heuristically Accelerated Reinforcement This paper investigates the use of the Case-Based Heuristically Learning and Neural Networks. To do Accelerated Reinforcement Learning (CB-HARL) algorithm so, we propose a new algorithm, called L3, which [Bianchi et al., 2009] as a means to transfer learning works in 3 stages: in the first stage, it uses Reinforcement acquired by one agent during its training in one problem to Learning to learn how to perform one another agent that has to learn how to solve a similar, but task, and stores the optimal policy for this problem more complex, problem. To do so, we propose a new algorithm, as a case-base; in the second stage, it uses a Neural called L3, which works in 3 stages: in the first stage, Network to map actions from one domain to actions it uses the Q-learning algorithm [Watkins, 1989] to learn how in the other domain and; in the third stage, it uses to perform one task, and stores the optimal policy for this the case-base learned in the first stage as heuristics problem as a case-base; in the second stage, it uses a Neural to speed up the learning performance in a related, Network to map actions from one domain to actions in but different, task. The RL algorithm used the other domain and; in the third stage, it uses the case-base in the first phase is the Q-learning and in the third learned in the first stage as heuristics in the CB-HARL algorithm, phase is the recently proposed Case-based Heuristically speeding up the learning process.