Goto

Collaborating Authors

 Whiteson, Shimon


Learning to Communicate with Deep Multi-Agent Reinforcement Learning

Neural Information Processing Systems

We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains.


Probably Approximately Correct Greedy Maximization

arXiv.org Machine Learning

Submodular function maximization finds application in a variety of real-world decision-making problems. However, most existing methods, based on greedy maximization, assume it is computationally feasible to evaluate F, the function being maximized. Unfortunately, in many realistic settings F is too expensive to evaluate exactly even once. We present probably approximately correct greedy maximization, which requires access only to cheap anytime confidence bounds on F and uses them to prune elements. We show that, with high probability, our method returns an approximately optimal set. We propose novel, cheap confidence bounds for conditional entropy, which appears in many common choices of F and for which it is difficult to find unbiased or bounded estimates. Finally, results on a real-world dataset from a multi-camera tracking system in a shopping mall demonstrate that our approach performs comparably to existing methods, but at a fraction of the computational cost.


Copeland Dueling Bandits

Neural Information Processing Systems

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist. Two algorithms are proposed that instead seek to minimize regret with respect to the Copeland winner, which, unlike the Condorcet winner, is guaranteed to exist. The first, Copeland Confidence Bound (CCB), is designed for small numbers of arms, while the second, Scalable Copeland Bandits (SCB), works better for large-scale problems. We provide theoretical results bounding the regret accumulated by CCB and SCB, both substantially improving existing results. Such existing results either offer bounds of the form O(K log T) but require restrictive assumptions, or offer bounds of the form O(K^2 log T) without requiring such assumptions. Our results offer the best of both worlds: O(K log T) bounds without restrictive assumptions.


Exploiting Submodular Value Functions for Faster Dynamic Sensor Selection

AAAI Conferences

A key challenge in the design of multi-sensor systems is the efficient allocation of scarce resources such as bandwidth, CPU cycles, and energy, leading to the dynamic sensor selection problem in which a subset of the available sensors must be selected at each timestep. While partially observable Markov decision processes (POMDPs) provide a natural decision-theoretic model for this problem, the computational cost of POMDP planning grows exponentially in the number of sensors, making it feasible only for small problems. We propose a new POMDP planning method that uses greedy maximization to greatly improve scalability in the number of sensors. We show that, under certain conditions, the value function of a dynamic sensor selection POMDP is submodular and use this result to bound the error introduced by performing greedy maximization. Experimental results on a real-world dataset from a multi-camera tracking system in a shopping mall show it achieves similar performance to existing methods but incurs only a fraction of the computational cost, leading to much better scalability in the number of cameras.


Towards Personalised Gaming via Facial Expression Recognition

AAAI Conferences

In this paper we propose an approach for personalising the space in which a game is played (i.e., levels) dependent on classifications of the user's facial expression  — to the end of tailoring the affective game experience to the individual user. Our approach is aimed at online game personalisation, i.e., the game experience is personalised during actual play of the game. A key insight of this paper is that game personalisation techniques can leverage novel computer vision-based techniques to unobtrusively infer player experiences automatically based on facial expression analysis. Specifically, to the end of tailoring the affective game experience to the individual user, in this paper we (1) leverage the proven InSight facial expression recognition SDK as a model of the user's affective state InSight, and (2) employ this model for guiding the online game personalisation process. User studies that validate the game personalisation approach in the actual video game Infinite Mario Bros. reveal that it provides an effective basis for converging to an appropriate affective state for the individual human player.


Bounded Approximations for Linear Multi-Objective Planning Under Uncertainty

AAAI Conferences

Planning under uncertainty poses a complex problem in which multiple objectives often need to be balanced. When dealing with multiple objectives, it is often assumed that the relative importance of the objectives is known a priori. However, in practice human decision makers often find it hard to specify such preferences, and would prefer a decision support system that presents a range of possible alternatives. We propose two algorithms for computing these alternatives for the case of linearly weighted objectives. First, we propose an anytime method, approximate optimistic linear support (AOLS), that incrementally builds up a complete set of ε-optimal plans, exploiting the piecewise linear and convex shape of the value function. Second, we propose an approximate anytime method, scalarised sample incremental improvement (SSII), that employs weight sampling to focus on the most interesting regions in weight space, as suggested by a prior over preferences. We show empirically that our methods are able to produce (near-)optimal alternative sets orders of magnitude faster than existing techniques.


Exploiting Agent and Type Independence in Collaborative Graphical Bayesian Games

arXiv.org Artificial Intelligence

Efficient collaborative decision making is an important challenge for multiagent systems. Finding optimal joint actions is especially challenging when each agent has only imperfect information about the state of its environment. Such problems can be modeled as collaborative Bayesian games in which each agent receives private information in the form of its type. However, representing and solving such games requires space and computation time exponential in the number of agents. This article introduces collaborative graphical Bayesian games (CGBGs), which facilitate more efficient collaborative decision making by decomposing the global payoff function as the sum of local payoff functions that depend on only a few agents. We propose a framework for the efficient solution of CGBGs based on the insight that they posses two different types of independence, which we call agent independence and type independence. In particular, we present a factor graph representation that captures both forms of independence and thus enables efficient solutions. In addition, we show how this representation can provide leverage in sequential tasks by using it to construct a novel method for decentralized partially observable Markov decision processes. Experimental results in both random and benchmark tasks demonstrate the improved scalability of our methods compared to several existing alternatives.



Report on the 2008 Reinforcement Learning Competition

AI Magazine

This article reports on the 2008 Reinforcement Learning Competition,  which began in November 2007 and ended with a workshop at the  International Conference on Machine Learning (ICML) in July 2008 in  Helsinki, Finland.  Researchers from around the world developed  reinforcement learning agents to compete in six problems of various  complexity and difficulty.  The competition employed fundamentally  redesigned evaluation frameworks that, unlike those in previous  competitions, aimed to systematically encourage the submission of  robust learning methods. We describe the unique challenges of  empirical evaluation in reinforcement learning and briefly review  the history of the previous competitions and the evaluation  frameworks they employed.  We also describe the novel frameworks  developed for the 2008 competition as well as the software  infrastructure on which they rely.  Furthermore, we describe the six  competition domains and present a summary of selected competition  results.  Finally, we discuss the implications of these results and  outline ideas for the future of the competition.