University of Alberta
Delay-Tolerant Online Convex Optimization: Unified Analysis and Adaptive-Gradient Algorithms
Joulani, Pooria (University of Alberta) | Gyorgy, Andras (Imperial College London) | Szepesvari, Csaba (University of Alberta)
We present a unified, black-box-style method for developing and analyzing online convex optimization (OCO) algorithms for full-information online learning in delayed-feedback environments. Our new, simplified analysis enables us to substantially improve upon previous work and to solve a number of open problems from the literature. Specifically, we develop and analyze asynchronous AdaGrad-style algorithms from the Follow-the-Regularized-Leader (FTRL) and Mirror-Descent family that, unlike previous works, can handle projections and adapt both to the gradients and the delays, without relying on either strong convexity or smoothness of the objective function, or data sparsity. Our unified framework builds on a natural reduction from delayed-feedback to standard (non-delayed) online learning. This reduction, together with recent unification results for OCO algorithms, allows us to analyze the regret of generic FTRL and Mirror-Descent algorithms in the delayed-feedback setting in a unified manner using standard proof techniques. In addition, the reduction is exact and can be used to obtain both upper and lower bounds on the regret in the delayed-feedback setting.
On the Completeness of Best-First Search Variants That Use Random Exploration
Valenzano, Richard Anthony (University of Toronto) | Xie, Fan (University of Alberta)
While suboptimal best-first search algorithms like Greedy Best-First Search are frequently used when building automated planning systems, their greedy nature can make them susceptible to being easily misled by flawed heuristics. This weakness has motivated the development of best-first search variants like epsilon-greedy node selection, type-based exploration, and diverse best-first search, which all use random exploration to mitigate the impact of heuristic error. In this paper, we provide a theoretical justification for this increased robustness by formally analyzing how these algorithms behave on infinite graphs. In particular, we show that when using these approaches on any infinite graph, the probability of not finding a solution can be made arbitrarily small given enough time. This result is shown to hold for a class of algorithms that includes the three mentioned above, regardless of how misleading the heuristic is.
Compressed Conditional Mean Embeddings for Model-Based Reinforcement Learning
Lever, Guy (University College London) | Shawe-Taylor, John (University College London) | Stafford, Ronnie (University College London) | Szepesvari, Csaba (University of Alberta)
We present a model-based approach to solving Markov decision processes (MDPs) in which the system dynamics are learned using conditional mean embeddings (CMEs). This class of methods comes with strong performance guarantees, and enables planning to be performed in an induced finite (pseudo-)MDP, which approximates the MDP, but can be solved exactly using dynamic programming. Two drawbacks of existing methods exist: firstly, the size of the induced finite (pseudo-)MDP scales quadratically with the amount of data used to learn the model, costing much memory and time when planning with the learned model; secondly, learning the CME itself using powerful kernel least-squares is costly – a second computational bottleneck. We present an algorithm which maintains a rich kernelized CME model class, but solves both problems: firstly we demonstrate that the loss function for the CME model suggests a principled approach to compressing the induced (pseudo-)MDP, leading to faster planning, while maintaining guarantees; secondly we propose to learn the CME model itself using fast sparse-greedy kernel regression well-suited to the RL context. We demonstrate superior performance to existing methods in this class of modelbased approaches on a range of MDPs.
Using Correlated Strategies for Computing Stackelberg Equilibria in Extensive-Form Games
Cermak, Jiri (Czech Technical University in Prague) | Bosansky, Branislav (Czech Technical University in Prague) | Durkota, Karel (Czech Technical University in Prague) | Lisy, Viliam (University of Alberta) | Kiekintveld, Christopher ( University of Texas at El Paso )
Strong Stackelberg Equilibrium (SSE) is a fundamental solution concept in game theory in which one player commits to a strategy, while the other player observes this commitment and plays a best response. We present a new algorithm for computing SSE for two-player extensive-form general-sum games with imperfect information (EFGs) where computing SSE is an NP-hard problem. Our algorithm is based on a correlated version of SSE, known as Stackelberg Extensive-Form Correlated Equilibrium (SEFCE). Our contribution is therefore twofold: (1) we give the first linear program for computing SEFCE in EFGs without chance, (2) we repeatedly solve and modify this linear program in a systematic search until we arrive to SSE. Our new algorithm outperforms the best previous algorithms by several orders of magnitude.
Factorization Ranking Model for Move Prediction in the Game of Go
Xiao, Chenjun (University of Alberta) | Müller, Martin (University of Alberta)
In this paper, we investigate the move prediction problem in the game of Go by proposing a new ranking model named Factorization Bradley Terry (FBT) model. This new model considers the move prediction problem as group competitions while also taking the interaction between features into account. A FBT model is able to provide a probability distribution that expresses a preference over moves. Therefore it can be easily compiled into an evaluation function and applied in a modern Go program. We propose a Stochastic Gradient Decent (SGD) algorithm to train a FBT model using expert game records, and provide two methods for fast computation of the gradient in order to speed up the training process. Experimental results show that our FBT model outperforms the state-of-the-art move prediction system of Latent Factor Ranking (LFR).
Bidirectional Search That Is Guaranteed to Meet in the Middle
Holte, Robert C. (University of Alberta) | Felner, Ariel (Ben-Gurion University) | Sharon, Guni (Ben-Gurion University) | Sturtevant, Nathan R. (University of Denver)
We present MM, the first bidirectional heuristic search algorithm whose forward and backward searches are guaranteed to ''meet in the middle'', i.e. never expand a node beyond the solution midpoint. We also present a novel framework for comparing MM, A*, and brute-force search, and identify conditions favoring each algorithm. Finally, we present experimental results that support our theoretical analysis.
Playable Experiences at AIIDE 2015
Cook, Michael (Falmouth University) | Eiserloh, Squirrel (Southern Methodist University) | Robertson, Justus (North Carolina State University) | Young, R. Michael (North Carolina State University) | Thompson, Tommy (Table Flip Games / University of Derby) | Churchill, David (Lunarch Studios / University of Alberta) | Cerny, Martin (Charles University in Prague) | Hernandez, Sergio Poo (University of Alberta) | Bulitko, Vadim (University of Alberta)
Keeping the Player on an Emotional Trajectory in Interactive Storytelling
Hernandez, Sergio Poo (University of Alberta) | Bulitko, Vadim (University of Alberta) | Spetch, Marcia (University of Alberta)
Artificial Intelligence (AI) techniques have been widely used in video games to control non-playable characters. More recently, AI has been applied to automated story generation with the objective of managing the player's experience in an interactive narrative. Such AI experience managers can generate and adapt narrative dynamically, often in response to the player's in-game actions. We implement and evaluate a recently proposed AI experience manager, PACE, which predicts the player's emotional response to a narrative event and uses such predictions to shape the narrative to keep the player on an author-supplied target emotional curve.
Using Lanchester Attrition Laws for Combat Prediction in StarCraft
Stanescu, Marius Adrian (University of Alberta) | Barriga, Nicolas (University of Alberta) | Buro, Michael (University of Alberta)
Smart decision making at the tactical level is important for Artificial Intelligence (AI) agents to perform well in the domain of real-time strategy (RTS) games. Winning battles is crucial in RTS games, and while humans can decide when and how to attack based on their experience, it is challenging for AI agents to estimate combat outcomes accurately. A few existing models address this problem in the game of StarCraft but present many restrictions, such as not modeling injured units, supporting only a small number of unit types, or being able to predict the winner of a fight but not the remaining army. Prediction using simulations is a popular method, but generally slow and requires extensive coding to model the game engine accurately. This paper introduces a model based on Lanchester's attrition laws which addresses the mentioned limitations while being faster than running simulations. Unit strength values are learned using maximum likelihood estimation from past recorded battles. We present experiments that use a StarCraft simulator for generating battles for both training and testing, and show that the model is capable of making accurate predictions. Furthermore, we implemented our method in a StarCraft bot that uses either this or traditional simulations to decide when to attack or to retreat. We present tournament results (against top bots from 2014 AIIDE competition) comparing the performances of the two versions, and show increased winning percentages for our method.
Maximizing Flow as a Metacontrol in Angband
Mariusdottir, Thorey Maria (University of Alberta) | Bulitko, Vadim (University of Alberta) | Brown, Matthew (University of Alberta)
Flow is a psychological state that is reported to improve people’s performance. Flow can emerge when the person’s skills and the challenges of their activity match. This paper applies this concept to artificial intelligence agents. We equip a decision-making agent with a metacontrol policy that guides the agent to activities where the agent’s skills match the activity difficulty. Consequently, we expect the agent’s performance to improve. We implement and evaluate this approach in the role-playing game of Angband.