Goto

Collaborating Authors

 Undirected Networks


An Exact Dynamic Programming Solution for a Decentralized Two-Player Markov Decision Process

AAAI Conferences

We present an exact dynamic programming solution for a finite-horizon decentralized two-player Markov decision process, where player 1 only has access to its own states, while player 2 has access to both playerโ€™s states but cannot affect player 1โ€™s states. The solution is obtained by solving several centralized partially-observable Markov decision processes. We then conclude with several computational examples.


POMDP Models for Continuous Calibration of Interactive Surfaces

AAAI Conferences

On interactive surfaces, an accurate system calibration is crucial for a precise user interaction. Today, geometric distortions are eliminated by a static calibration. However, this calibration is specific to a userโ€™s posture, and parallax distortions occur if this changes (i.e. if the user moves or if multiple users take turns). Within this paper, we describe an approach to model automatic online re-calibration to cope with changing viewpoints by using Partially Observable Markov Decision Processes (POMDP). Hereby, the viewpoint is stochastically deducted from the precision of user interactions on the surface. To enable the implementation on embedded systems, a small model is defined using states and observations, which are formulated relative to the current assumed viewpoint. We show the structure of a family of models, that can be generated automatically based on the userโ€™s position probability and pointing accuracy.


Elliptical slice sampling

arXiv.org Machine Learning

Many probabilistic models introduce strong dependencies between variables using a latent multivariate Gaussian distribution or a Gaussian process. We present a new Markov chain Monte Carlo algorithm for performing inference in models with multivariate Gaussian priors. Its key properties are: 1) it has simple, generic code applicable to many models, 2) it has no free parameters, 3) it works well for a variety of Gaussian process based models. These properties make our method ideal for use while model building, removing the need to spend time deriving and tuning updates for more complex algorithms.


A Minimum Relative Entropy Controller for Undiscounted Markov Decision Processes

arXiv.org Artificial Intelligence

Adaptive control problems are notoriously difficult to solve even in the presence of plant-specific controllers. One way to by-pass the intractable computation of the optimal policy is to restate the adaptive control as the minimization of the relative entropy of a controller that ignores the true plant dynamics from an informed controller. The solution is given by the Bayesian control rule-a set of equations characterizing a stochastic adaptive controller for the class of possible plant dynamics. Here, the Bayesian control rule is applied to derive BCR-MDP, a controller to solve undiscounted Markov decision processes with finite state and action spaces and unknown dynamics. In particular, we derive a non-parametric conjugate prior distribution over the policy space that encapsulates the agent's whole relevant history and we present a Gibbs sampler to draw random policies from this distribution. Preliminary results show that BCR-MDP successfully avoids sub-optimal limit cycles due to its built-in mechanism to balance exploration versus exploitation.


Why Programming-By-Demonstration Systems Fail: Lessons Learned for Usable AI

AI Magazine

Programming by demonstration systems have long attempted to make it possible for people to program computers without writing code. However, while these systems have resulted in many publications in AI venues, none of the technologies have yet achieved widespread.adoption. Usability remains a critical barrier to their success. On the basis of lessons learned from three different programming by demonstration systems, we present a set of guidelines to consider when designing usable AI-based systems.


Learning to Explore and Exploit in POMDPs

Neural Information Processing Systems

A fundamental objective in reinforcement learning is the maintenance of a proper balance between exploration and exploitation. This problem becomes more challenging when the agent can only partially observe the states of its environment. In this paper we propose a dual-policy method for jointly learning the agent behavior and the balance between exploration exploitation, in partially observable environments. The method subsumes traditional exploration, in which the agent takes actions to gather information about the environment, and active learning, in which the agent queries an oracle for optimal actions (with an associated cost for employing the oracle). The form of the employed exploration is dictated by the specific problem. Theoretical guarantees are provided concerning the optimality of the balancing of exploration and exploitation. The effectiveness of the method is demonstrated by experimental results on benchmark problems.


Sharing Features among Dynamical Systems with Beta Processes

Neural Information Processing Systems

We propose a Bayesian nonparametric approach to relating multiple time series via a set of latent, dynamical behaviors. Using a beta process prior, we allow data-driven selection of the size of this set, as well as the pattern with which behaviors are shared among time series. Via the Indian buffet process representation of the beta process predictive distributions, we develop an exact Markov chain Monte Carlo inference method. In particular, our approach uses the sum-product algorithm to efficiently compute Metropolis-Hastings acceptance probabilities, and explores new dynamical behaviors via birth/death proposals. We validate our sampling algorithm using several synthetic datasets, and also demonstrate promising unsupervised segmentation of visual motion capture data.


Recursive Segmentation and Recognition Templates for 2D Parsing

Neural Information Processing Systems

Language and image understanding are two major goals of artificial intelligence which can both be conceptually formulated in terms of parsing the input signal into a hierarchical representation. Natural language researchers have made great progress by exploiting the 1D structure of language to design efficient polynomialtime parsing algorithms. By contrast, the two-dimensional nature of images makes it much harder to design efficient image parsers and the form of the hierarchical representations is also unclear. Attempts to adapt representations and algorithms from natural language have only been partially successful. In this paper, we propose a Hierarchical Image Model (HIM) for 2D image parsing which outputs image segmentation and object recognition.


Conditional Neural Fields

Neural Information Processing Systems

Conditional random fields (CRF) are quite successful on sequence labeling tasks such as natural language processing and biological sequence analysis. CRF models use linear potential functions to represent the relationship between input features and outputs. However, in many real-world applications such as protein structure prediction and handwriting recognition, the relationship between input features and outputs is highly complex and nonlinear, which cannot be accurately modeled by a linear function. To model the nonlinear relationship between input features and outputs we propose Conditional Neural Fields (CNF), a new conditional probabilistic graphical model for sequence labeling. Our CNF model extends CRF by adding one (or possibly several) middle layer between input features and outputs. The middle layer consists of a number of hidden parameterized gates, each acting as a local neural network node or feature extractor to capture the nonlinear relationship between input features and outputs. Therefore, conceptually this CNF model is much more expressive than the linear CRF model. To better control the complexity of the CNF model, we also present a hyperparameter optimization procedure within the evidence framework. Experiments on two widely-used benchmarks indicate that this CNF model performs significantly better than a number of popular methods. In particular, our CNF model is the best among about ten machine learning methods for protein secondary tructure prediction and also among a few of the best methods for handwriting recognition.


Hierarchical Semi-Markov Conditional Random Fields for Recursive Sequential Data

Neural Information Processing Systems

Inspired by the hierarchical hidden Markov models (HHMM), we present the hierarchical semi-Markov conditional random field (HSCRF), a generalisation of embedded undirected Markov chains to model complex hierarchical, nested Markov processes. It is parameterised in a discriminative framework and has polynomial time algorithms for learning and inference. Importantly, we develop efficient algorithms for learning and constrained inference in a partially-supervised setting, which is important issue in practice where labels can only be obtained sparsely. We demonstrate the HSCRF in two applications: (i) recognising human activities of daily living (ADLs) from indoor surveillance cameras, and (ii) noun-phrase chunking. We show that the HSCRF is capable of learning rich hierarchical models with reasonable accuracy in both fully and partially observed data cases.