Goto

Collaborating Authors

 Undirected Networks


Sparse Q-learning with Mirror Descent

arXiv.org Machine Learning

This paper explores a new framework for reinforcement learning based on online convex optimization, in particular mirror descent and related algorithms. Mirror descent can be viewed as an enhanced gradient method, particularly suited to minimization of convex functions in highdimensional spaces. Unlike traditional gradient methods, mirror descent undertakes gradient updates of weights in both the dual space and primal space, which are linked together using a Legendre transform. Mirror descent can be viewed as a proximal algorithm where the distance generating function used is a Bregman divergence. A new class of proximal-gradient based temporal-difference (TD) methods are presented based on different Bregman divergences, which are more powerful than regular TD learning. Examples of Bregman divergences that are studied include p-norm functions, and Mahalanobis distance based on the covariance of sample gradients. A new family of sparse mirror-descent reinforcement learning methods are proposed, which are able to find sparse fixed points of an l1-regularized Bellman equation at significantly less computational cost than previous methods based on second-order matrix methods. An experimental study of mirror-descent reinforcement learning is presented using discrete and continuous Markov decision processes.


Unsupervised Joint Alignment and Clustering using Bayesian Nonparametrics

arXiv.org Machine Learning

Joint alignment of a collection of functions is the process of independently transforming the functions so that they appear more similar to each other. Typically, such unsupervised alignment algorithms fail when presented with complex data sets arising from multiple modalities or make restrictive assumptions about the form of the functions or transformations, limiting their generality. We present a transformed Bayesian infinite mixture model that can simultaneously align and cluster a data set. Our model and associated learning scheme offer two key advantages: the optimal number of clusters is determined in a data-driven fashion through the use of a Dirichlet process prior, and it can accommodate any transformation function parameterized by a continuous parameter vector. As a result, it is applicable to a wide range of data types, and transformation functions. We present positive results on synthetic two-dimensional data, on a set of one-dimensional curves, and on various image data sets, showing large improvements over previous work. We discuss several variations of the model and conclude with directions for future work.


Hilbert Space Embeddings of POMDPs

arXiv.org Machine Learning

A nonparametric approach for policy learning for POMDPs is proposed. The approach represents distributions over the states, observations, and actions as embeddings in feature spaces, which are reproducing kernel Hilbert spaces. Distributions over states given the observations are obtained by applying the kernel Bayes' rule to these distribution embeddings. Policies and value functions are defined on the feature space over states, which leads to a feature space expression for the Bellman equation. Value iteration may then be used to estimate the optimal value function and associated policy. Experimental results confirm that the correct policy is learned using the feature space representation.


Semi-Supervised Classification Through the Bag-of-Paths Group Betweenness

arXiv.org Machine Learning

This paper introduces a novel, well-founded, betweenness measure, called the Bag-of-Paths (BoP) betweenness, as well as its extension, the BoP group betweenness, to tackle semisupervised classification problems on weighted directed graphs. The objective of semi-supervised classification is to assign a label to unlabeled nodes using the whole topology of the graph and the labeled nodes at our disposal. The BoP betweenness relies on a bag-of-paths framework assigning a Boltzmann distribution on the set of all possible paths through the network such that long (high-cost) paths have a low probability of being picked from the bag, while short (low-cost) paths have a high probability of being picked. Within that context, the BoP betweenness of node j is defined as the sum of the a posteriori probabilities that node j lies in-between two arbitrary nodes i, k, when picking a path starting in i and ending in k. Intuitively, a node typically receives a high betweenness if it has a large probability of appearing on paths connecting two arbitrary nodes of the network. This quantity can be computed in closed form by inverting a n x n matrix where n is the number of nodes. For the group betweenness, the paths are constrained to start and end in nodes within the same class, therefore defining a group betweenness for each class. Unlabeled nodes are then classified according to the class showing the highest group betweenness. Experiments on various real-world data sets show that BoP group betweenness outperforms all the tested state of-the-art methods. The benefit of the BoP betweenness is particularly noticeable when only a few labeled nodes are available.


Distributed Problem Solving

AI Magazine

Distributed problem solving is a subfield within multiagent systems, where agents are assumed to be part of a team and collaborate with each other to reach a common goal. In this article, we illustrate the motivations for distributed problem solving and provide an overview of two distributed problem solving models, namely distributed constraint satisfaction problems (DCSPs) and distributed constraint optimization problems (DCOPs), and some of their algorithms.


Adaptive sequential Monte Carlo by means of mixture of experts

arXiv.org Machine Learning

Appropriately designing the proposal kernel of particle filters is an issue of significant importance, since a bad choice may lead to deterioration of the particle sample and, consequently, waste of computational power. In this paper we introduce a novel algorithm adaptively approximating the so-called optimal proposal kernel by a mixture of integrated curved exponential distributions with logistic weights. This family of distributions, referred to as mixtures of experts, is broad enough to be used in the presence of multi-modality or strongly skewed distributions. The mixtures are fitted, via online-EM methods, to the optimal kernel through minimisation of the Kullback-Leibler divergence between the auxiliary target and instrumental distributions of the particle filter. At each iteration of the particle filter, the algorithm is required to solve only a single optimisation problem for the whole particle sample, yielding an algorithm with only linear complexity. In addition, we illustrate in a simulation study how the method can be successfully applied to optimal filtering in nonlinear state-space models.


Game-Based Data Capture for Player Metrics

AAAI Conferences

Player metrics are an invaluable resource for game designers and QA analysts who wish to understand players, monitor and improve game play, and test design hypotheses. Usually such metrics are collected in a straightforward manner by passively recording players; however, such an approach has several potential drawbacks. First, passive recording might fail to record metrics which correspond to an infrequent player behavior. Secondly, passive recording can be a costly, laborious, and memory intensive process, even with the aid of tools. In this paper, we explore the potential for an active approach to player metric collection which strives to collect data more efficiently, and thus with less cost. We use an online, iterative approach which models the relationship between player metrics and in-game situations probabilistically using a Markov Decision Process (MDP) and solves it for the best game configurations to run. To analyze the benefits and limitations of this approach, we implemented a system, called GAMELAB, for recording player metrics in Second Life.


Adversarial Policy Switching with Application to RTS Games

AAAI Conferences

Complex games such as RTS games are naturally formalized as Markov games. Given a Markov game, it is often possible to hand-code or learn a set of policies that capture the diversity of possible strategies. It is also often possible to hand-code or learn an abstract simulator of the game that can estimate the outcome of playing two strategies against one another from any state. We consider how to use such policy sets and simulators to make decisions in large Markov games. Prior work has considered the problem using an approach we call minimax policy switching. At each decision epoch, all policy pairs are simulated against each other from the current state, and the minimax policy is chosen and used to select actions until the next decision epoch. While intuitively appealing, we show that this switching policy can have arbitrarily poor worst case performance. In response, we describe a modified algorithm, monotone policy switching, whose worst case performance, under certain conditions, is provably no worse than the minimax fixed policy in the set. We evaluate these switching policies in both a simulated RTS game and the real game Wargus. The results show the effectiveness of policy switching when the simulator is accurate, and also highlight challenges in the face of inaccurate simulations.


Toward Narrative Schema-Based Goal Recognition Models for Interactive Narrative Environments

AAAI Conferences

Computational models for goal recognition hold great promise for enhancing the capabilities of drama managers and director agents for interactive narratives. The problem of goal recognition, and its more general form, plan recognition, have been the subjects of extensive investigation in the AI community. However, relatively little effort has been undertaken to examine goal recognition in interactive narrative. In this paper, we propose a research agenda to improve the accuracy of goal recognition models for interactive narratives using explicit representations of narrative structure inspired by the natural language processing community. We describe a particular category of narrative representations, narrative schemas, that we anticipate will effectively capture patterns of player behavior in interactive narratives and improve the accuracy of goal recognition models.


Assistant Agents for Sequential Planning Problems

AAAI Conferences

The problem of optimal planning under uncertainty in collaborative multi-agent domains is known to be deeply intractable but still demands a solution. This thesis will explore principled approximation methods that yield tractable approaches to planning for AI assistants, which allow them to understand the intentions of humans and help them achieve their goals. AI assistants are ubiquitous in video games, mak- ing them attractive domains for applying these planning techniques. However, games are also challenging domains, typically having very large state spaces and long planning horizons. The approaches in this thesis will leverage recent advances in Monte-Carlo search, approximation of stochastic dynamics by deterministic dynamics, and hierarchical action representation, to handle domains that are too complex for existing state of the art planners. These planning techniques will be demonstrated across a range of video game domains.