Well File:

 University of Alberta


Exploiting Action Impact Regularity and Exogenous State Variables for Offline Reinforcement Learning

Journal of Artificial Intelligence Research

Offline reinforcement learning--learning a policy from a batch of data--is known to be hard for general MDPs. These results motivate the need to look at specific classes of MDPs where offline reinforcement learning might be feasible. In this work, we explore a restricted class of MDPs to obtain guarantees for offline reinforcement learning. The key property, which we call Action Impact Regularity (AIR), is that actions primarily impact a part of the state (an endogenous component) and have limited impact on the remaining part of the state (an exogenous component). AIR is a strong assumption, but it nonetheless holds in a number of real-world domains including financial markets. We discuss algorithms that exploit the AIR property, and provide a theoretical analysis for an algorithm based on Fitted-Q Iteration. Finally, we demonstrate that the algorithm outperforms existing offline reinforcement learning algorithms across different data collection policies in simulated and real world environments where the regularity holds.


Multi-Agent Advisor Q-Learning

Journal of Artificial Intelligence Research

In the last decade, there have been significant advances in multi-agent reinforcement learning (MARL) but there are still numerous challenges, such as high sample complexity and slow convergence to stable policies, that need to be overcome before wide-spread deployment is possible. However, many real-world environments already, in practice, deploy sub-optimal or heuristic approaches for generating policies. An interesting question that arises is how to best use such approaches as advisors to help improve reinforcement learning in multi-agent domains. In this paper, we provide a principled framework for incorporating action recommendations from online suboptimal advisors in multi-agent settings. We describe the problem of ADvising Multiple Intelligent Reinforcement Agents (ADMIRAL) in nonrestrictive general-sum stochastic game environments and present two novel Q-learning based algorithms: ADMIRAL - Decision Making (ADMIRAL-DM) and ADMIRAL - Advisor Evaluation (ADMIRAL-AE), which allow us to improve learning by appropriately incorporating advice from an advisor (ADMIRAL-DM), and evaluate the effectiveness of an advisor (ADMIRAL-AE). We analyze the algorithms theoretically and provide fixed point guarantees regarding their learning in general-sum stochastic games. Furthermore, extensive experiments illustrate that these algorithms: can be used in a variety of environments, have performances that compare favourably to other related baselines, can scale to large state-action spaces, and are robust to poor advice from advisors.


A Guide to Budgeted Tree Search

AAAI Conferences

Budgeted Tree Search (BTS), a variant of Iterative Budgeted Exponential Search, is a new algorithm that has the same performance as IDA* on problems where the state space grows exponentially, but has far better performance than IDA* in other cases where IDA* fails. The goal of this paper is to provide a detailed guide to BTS with worked examples to make the algorithm more accessible to practitioners in heuristic search.


Multi-Directional Search

AAAI Conferences

In the Multi-Agent Meeting (MAM) problem, the task is to find a meeting location for multiple agents, as well as a path for each agent to that location. In this paper, we introduce MM*, a Multi-Directional Search algorithm that finds the optimal meeting location under different cost functions. MM* generalizes the Meet in the Middle (MM) bidirectional search algorithm to the case of finding optimal meeting locations for multiple agents. A number of admissible heuristics are proposed and experiments demonstrate the benefits of MM*.


Toward a Unified Understanding of Experience Management

AAAI Conferences

We present a new way to represent and understand experience managers — AI agents that tune the parameters of a running game to pursue a designer's goal. Existing representations of AI managers are diverse, which complicates the task of drawing useful comparisons between them. Contrary to previous representations, ours uses a point of unity as its basis: that every game/manager pair can be viewed as only a game with the manager embedded inside. From this basis, we show that several common, differently-represented concepts of experience management can be re-expressed in a unified way. We demonstrate our new representation concretely by comparing two different representations, Search-Based Drama Management and Generalized Experience Management, and we present the insights that we have gained from this effort.


Action Abstractions for Combinatorial Multi-Armed Bandit Tree Search

AAAI Conferences

Search algorithms based on combinatorial multi-armed bandits (CMABs) are promising for dealing with state-space sequential decision problems. However, current CMAB-based algorithms do not scale to problem domains with very large actions spaces, such as real-time strategy games played in large maps. In this paper we introduce CMAB-based search algorithms that use action abstraction schemes to reduce the action space considered during search. One of the approaches we introduce use regular action abstractions (A1N), while the other two use asymmetric action abstractions (A2N and A3N). Empirical results on MicroRTS show that A1N, A2N, and A3N are able to outperform an existing CMAB-based algorithm in matches played in large maps, and A3N is able to outperform all state-of-the-art search algorithms tested.


Exhaustive and Semi-Exhaustive Procedural Content Generation

AAAI Conferences

Within the area of procedural content generation (PCG) there are a wide range of techniques that have been used to generate content. Many of these techniques use traditional artificial intelligence approaches, such as genetic algorithms, planning, and answer-set programming. One area that has not been widely explored is straightforward combinatorial search -- exhaustive enumeration of the entire design space or a significant subset thereof. This paper synthesizes literature from mathematics and other subfields of Artificial Intelligence to provide reference for the algorithms needed when approaching exhaustive procedural content generation. It builds on this with algorithms for exhaustive search and complete examples how they can be applied in practice.


Improbotics: Exploring the Imitation Game Using Machine Intelligence in Improvised Theatre

AAAI Conferences

Theatrical improvisation (impro or improv) is a demanding form of live, collaborative performance. Improv is a humorous and playful artform built on an open-ended narrative structure which simultaneously celebrates effort and failure. It is thus an ideal test bed for the development and deployment of interactive artificial intelligence (AI)-based conversational agents, or artificial improvisors. This case study introduces an improv show experiment featuring human actors and artificial improvisors. We have previously developed a deep-learning-based artificial improvisor, trained on movie subtitles, that can generate plausible, context-based, lines of dialogue suitable for theatre. In this work, we have employed it to control what a subset of human actors say during an improv performance. We also give human-generated lines to a different subset of performers. All lines are provided to actors with headphones and all performers are wearing headphones. This paper describes a Turing test, or imitation game, taking place in a theatre, with both the audience members and the performers left to guess who is a human and who is a machine. In order to test scientific hypotheses about the perception of humans versus machines we collect anonymous feedback from volunteer performers and audience members. Our results suggest that rehearsal increases proficiency and possibility to control events in the performance. That said, consistency with real world experience is limited by the interface and the mechanisms used to perform the show. We also show that human-generated lines are shorter, more positive, and have less difficult words with more grammar and spelling mistakes than the artificial improvisor generated lines.


The 13th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment

AI Magazine

The 13th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE 2017) was held at the Snowbird Ski and Summer Resort in Little Cottonwod Canyon in the Wasatch Range of the Rock Mountains near Salt Lake County, Utah. Along with the main conference presentations, the meeting included two tutorials, three workshops, and invited keynotes. This report summarizes the main conference. It also includes contributions from the organizers of the three workshops.


Output Encoding by Compressed Sensing for Cell Detection with Deep Convnet

AAAI Conferences

Output encoding often leads to superior accuracies in various machine learning tasks. In this paper we look at a significant task of cell detection/localization from microscopy images as a test case for output encoding. Since the output space is sparse for the cell detection problem (only a few pixel locations are cell centers), we employ compressed sensing (CS)-based output encoding here. Using random projections, CS converts the sparse, output pixel space into dense and short (i.e., compressed) vectors. As a regressor, we use deep convolutional neural net (CNN) to predict the compressed vectors. Then applying a $L_1$-norm recovery algorithm to the predicted vectors, we recover sparse cell locations in the output pixel space. We demonstrate CS-based output encoding provides us with the opportunity to do ensemble averaging to boost detection/localization scores. We experimentally demonstrate that the proposed CNN + CS framework (referred to as CNNCS) is competitive or better than the state-of-the-art methods on benchmark datasets for microscopy cell detection. In the AMIDA13 MICCAI grand competition, we achieve the 3rd highest F1-score in all the 17 participated teams.