AITopics

Multi-task learning remains a difficult yet important problem in machine learning. In Gaussian processes the main challenge is the definition of valid kernels (covariance functions) able to capture the relationships between different tasks. This paper presents a novel methodology to construct valid multi-task covariance functions (Mercer kernels) for Gaussian processes allowing for a combination of kernels with different forms. The method is based on Fourier analysis and is general for arbitrary stationary covariance functions. Analytical solutions for cross covariance terms between popular forms are provided including Mat´ern, squared exponential and sparse covariance functions. Experiments are conducted with both artificial and real datasets demonstrating the benefits of the approach.

covariance function, cross covariance term, rn 3 2, (13 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Materials > Metals & Mining > Iron (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Agent-Oriented Incremental Team and Activity Recognition

Masato, Daniele (University of Aberdeen) | Norman, Timothy J. (University of Aberdeen) | Vasconcelos, Wamberto W. (University of Aberdeen) | Sycara, Katia (Carnegie Mellon University)

Monitoring team activity is beneficial when human teams cooperate in the enactment of a joint plan. Monitoring allows teams to maintain awareness of each other's progress within the plan and it enables anticipation of information needs. Humans find this difficult, particularly in time-stressed and uncertain environments. In this paper we introduce a probabilistic model, based on Conditional Random Fields, to automatically recognise the composition of teams and the team activities in relation to a plan. The team composition and activities are recognised incrementally by interpreting a stream of spatio-temporal observations.

activity recognition, conditional random field, recognition, (13 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > California (0.04)
North America > United States > Wisconsin (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(7 more...)

Industry:

Leisure & Entertainment (1.00)
Government > Military (0.68)

Activity Recognition with Finite State Machines

Kerr, Wesley (University of Arizona) | Tran, Anh (University of Arizona) | Cohen, Paul (University of Arizona)

This paper shows how to learn general, Finite State Machine representations of activities that function as recognizers of previously unseen instances of activities. The central problem is to tell which differences between instances of activities are unimportant and may be safely ignored for the purpose of learning generalized representations of activities. We develop a novel way to find the "essential parts" of activities by a greedy kind of multiple sequence alignment, and a method to transform the resulting alignments into Finite State Machine that will accept novel instances of activities with high accuracy.

recognizer, relation, sequence, (15 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country: North America > United States > Arizona (0.05)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Kamthe, Ankur (University of California, Merced) | Carreira-Perpinan, Miguel Angel (University of California, Merced) | Cerpa, Alberto E. (University of California, Merced)

Adaptation of a Mixture of Multivariate Bernoulli Distributions

The mixture of multivariate Bernoulli distributions (MMB) is a statistical model for high-dimensional binary data in widespread use. Recently, the MMB has been used to model the sequence of packet receptions and losses of wireless links in sensor networks. Given an MMB trained on long data traces recorded from links of a deployed network, one can then use samples from the MMB to test different routing algorithms for as long as desired. However, learning an accurate model for a new link requires collecting from it long traces over periods of hours, a costly process in practice (e.g. limited battery life). We propose an algorithm that can adapt a preexisting MMB trained with extensive data to a new link from which very limited data is available. Our approach constrains the new MMB's parameters through a nonlinear transformation of the existing MMB's parameters. The transformation has a small number of parameters that are estimated using a generalized EM algorithm with an inner loop of BFGS iterations. We demonstrate the efficacy of the approach using the MNIST dataset of handwritten digits, and wireless link data from a sensor network. We show we can learn accurate models from data traces of about 1 minute, about 10 times shorter than needed if training an MMB from scratch.

adaptation, algorithm, transformation, (14 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > California > Merced County > Merced (0.04)
North America > Greenland (0.04)

Industry: Telecommunications (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Communications > Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Gu, Quanquan (University of Illinois at Urbana-Champaign) | Li, Zhenhui (University of Illinois at Urbana-Champaign) | Han, Jiawei (University of Illinois at Urbana-Champaign)

Joint Feature Selection and Subspace Learning

Dimensionality reduction is a very important topic in machine learning. It can be generally classified into two categories: feature selection and subspace learning. In the past decades, many methods have been proposed for dimensionality reduction. However, most of these works study feature selection and subspace learning independently. In this paper, we present a framework for joint feature selection and subspace learning. We reformulate the subspace learning problem and use L {2,1} -norm on the projection matrix to achieve row-sparsity, which leads to selecting relevant features and learning transformation simultaneously. We discuss two situations of the proposed framework, and present their optimization algorithms. Experiments on benchmark face recognition data sets illustrate that the proposed framework outperforms the state of the art methods overwhelmingly.

feature selection, matrix, subspace, (13 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > Illinois (0.04)
North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.34)

Industry: Government > Military (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Gu, Quanquan (University of Illinois at Urbana-Champaign) | Ding, Chris (University of Texas at Arlington) | Han, Jiawei (University of Illinois at Urbana-Champaign)

On Trivial Solution and Scale Transfer Problems in Graph Regularized NMF

Combining graph regularization with nonnegative matrix (tri-)factorization (NMF) has shown great performance improvement compared with traditional nonnegative matrix (tri-)factorization models due to its ability to utilize the geometric structure of the documents and words. In this paper, we show that these models are not well-defined and suffering from trivial solution and scale transfer problems. In order to solve these common problems, we propose two models for graph regularized nonnegative matrix (tri-)factorization, which can be applied for document clustering and co-clustering respectively. In the proposed models, a Normalized Cut-like constraint is imposed on the cluster assignment matrix to make the optimization problem well-defined. We derive a multiplicative updating algorithm for the proposed models, and prove its convergence. Experiments of clustering and co-clustering on benchmark text data sets demonstratethat the proposed models outperform the originalmodels as well as many other state-of-the-art clustering methods.

algorithm, gu and zhou, matrix, (11 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > Illinois (0.04)
North America > United States > Texas (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry: Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Continuous Correlated Beta Processes

Goetschalckx, Robby (University of Dundee) | Poupart, Pascal (University of Waterloo) | Hoey, Jesse (University of Waterloo)

In this paper we consider a (possibly continuous) space of Bernoulli experiments. We assume that the Bernoulli distributions of the points are correlated. All evidence data comes in the form of successful or failed experiments at different points. Current state-of-the-art methods for expressing a distribution over a continuum of Bernoulli distributions use logistic Gaussian processes or Gaussian copula processes. However, both of these require computationally expensive matrix operations (cubic in the general case). We introduce a more intuitive approach, directly correlating beta distributions by sharing evidence between them according to a kernel function, an approach which has linear time complexity. The approach can easily be extended to multiple outcomes, giving a continuous correlated Dirichlet process.This approach can be used for classification (both binary and multi-class) and learning the actual probabilities of the Bernoulli distributions. We show results for a number of data sets, as well as a case-study where a mixture of continuous beta processes is used as part of an automated stroke rehabilitation system.

beta distribution, experiment, probability, (15 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States (0.14)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)

Genre:

Research Report > Strength Low (0.34)
Research Report > Promising Solution (0.34)
Research Report > Experimental Study > Negative Result (0.34)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Chenthamarakshan, Vijil (IBM T J Watson Research Center Yorktown Heights) | Melville, Prem (IBM T J Watson Research Center Yorktown Heights) | Sindhwani, Vikas (IBM T J Watson Research Center Yorktown Heights) | Lawrence, Richard D (IBM T J Watson Research Center Yorktown Heights)

Concept Labeling: Building Text Classifiers with Minimal Supervision

The rapid construction of supervised text classification models is becoming a pervasive need across many modern applications. To reduce human-labeling bottlenecks, many new statistical paradigms (e.g., active, semi-supervised, transfer and multi-task learning) have been vigorously pursued in recent literature with varying degrees of empirical success. Concurrently, the emergence of Web 2.0 platforms in the last decade has enabled a world-wide, collaborative human effort to construct a massive ontology of concepts with very rich, detailed and accurate descriptions. In this paper we propose a new framework to extract supervisory information from such ontologies and complement it with a shift in human effort from direct labeling of examples in the domain of interest to the much more efficient identification of concept-class associations. Through empirical studies on text categorization problems using the Wikipedia ontology, we show that this shift allows very high-quality models to be immediately induced at virtually no cost.

category, classifier, ontology, (15 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia (0.04)
Africa (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.51)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.49)
(3 more...)

Chan, Jeffrey (National University Ireland, Galway) | Lam, Samantha (National University Ireland, Galway) | Hayes, Conor (National University Ireland, Galway)

Increasing the Scalability of the Fitting of Generalised Block Models for Social Networks

In recent years, the summarisation and decomposition of social networks has become increasingly popular, from community finding to role equivalence. However, these approaches concentrate on one type of model only. Generalised block modelling decomposes a network into independent, interpretable, labeled blocks, where the block labels summarise the relationship between two sets of users. Existing algorithms for fitting generalised block models do not scale beyond networks of 100 vertices. In this paper, we introduce two new algorithms, one based on genetic algorithms and the other on simulated annealing, that is at least two orders of magnitude faster than existing algorithms and obtaining similar accuracy. Using synthetic and real datasets, we demonstrate their efficiency and accuracy and show how generalised block modelling and our new approaches enable tractable network summarisation and modelling of medium sized networks.

algorithm, blockmodel, partition, (17 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Spain > Aragón (0.04)
Europe > Ireland > Connaught > County Galway > Galway (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Services (0.61)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.50)

Jr., Luiz A. Celiberto (Technological Institute of Aeronautics) | Matsuura, Jackson P. (Technological Institute of Aeronautics) | Mantaras, Ramon Lopez de (Artificial Intelligence Research Institute (IIIA-CSIC)) | Bianchi, Reinaldo A. C. (Centro Universitario da FEI)

Using Cases as Heuristics in Reinforcement Learning: A Transfer Learning Application

Another way to speed up a RL algorithm is by using Transfer Learning, a paradigm of machine learning that In this paper we propose to combine three AI techniques reuses knowledge accumulated in a previous task to speed up to speed up a Reinforcement Learning algorithm the learning of a novel, but related, target task [Taylor and in a Transfer Learning problem: Casebased Stone, 2009]. Reasoning, Heuristically Accelerated Reinforcement This paper investigates the use of the Case-Based Heuristically Learning and Neural Networks. To do Accelerated Reinforcement Learning (CB-HARL) algorithm so, we propose a new algorithm, called L3, which [Bianchi et al., 2009] as a means to transfer learning works in 3 stages: in the first stage, it uses Reinforcement acquired by one agent during its training in one problem to Learning to learn how to perform one another agent that has to learn how to solve a similar, but task, and stores the optimal policy for this problem more complex, problem. To do so, we propose a new algorithm, as a case-base; in the second stage, it uses a Neural called L3, which works in 3 stages: in the first stage, Network to map actions from one domain to actions it uses the Q-learning algorithm [Watkins, 1989] to learn how in the other domain and; in the third stage, it uses to perform one task, and stores the optimal policy for this the case-base learned in the first stage as heuristics problem as a case-base; in the second stage, it uses a Neural to speed up the learning performance in a related, Network to map actions from one domain to actions in but different, task. The RL algorithm used the other domain and; in the third stage, it uses the case-base in the first phase is the Q-learning and in the third learned in the first stage as heuristics in the CB-HARL algorithm, phase is the recently proposed Case-based Heuristically speeding up the learning process.

agent, algorithm, learning, (14 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

South America > Brazil (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain (0.04)

Genre:

Research Report (0.86)
Instructional Material > Course Syllabus & Notes (0.56)
Overview (0.55)

Industry: Leisure & Entertainment > Sports (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)