Goto

Collaborating Authors

 Edmonton


The Arcade Learning Environment: An Evaluation Platform for General Agents

Journal of Artificial Intelligence Research

In this article we introduce the Arcade Learning Environment (ALE): both a challenge problem and a platform and methodology for evaluating the development of general, domain-independent AI technology. ALE provides an interface to hundreds of Atari 2600 game environments, each one different, interesting, and designed to be a challenge for human players. ALE presents significant research challenges for reinforcement learning, model learning, model-based planning, imitation learning, transfer learning, and intrinsic motivation. Most importantly, it provides a rigorous testbed for evaluating and comparing approaches to these problems. We illustrate the promise of ALE by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning. In doing so, we also propose an evaluation methodology made possible by ALE, reporting empirical results on over 55 different games. All of the software, including the benchmark agents, is publicly available.


Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

Neural Information Processing Systems

Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing strategies in extensive-form games. The Monte Carlo CFR (MCCFR) variants reduce the per iteration time cost of CFR by traversing a smaller, sampled portion of the tree. The previous most effective instances of MCCFR can still be very slow in games with many player actions since they sample every action for a given player. In this paper, we present a new MCCFR algorithm, Average Strategy Sampling(AS), that samples a subset of the player's actions according to the player's average strategy. Our new algorithm is inspired by a new, tighter bound on the number of iterations required by CFR to converge to a given solution quality. In addition, we prove a similar, tighter bound for AS and other popular MCCFR variants.


Deep Representations and Codes for Image Auto-Annotation

Neural Information Processing Systems

The task of assigning a set of relevant tags to an image is challenging due to the size and variability of tag vocabularies. Consequently, most existing algorithms focus on tag assignment and fix an often large number of hand-crafted features to describe image characteristics. In this paper we introduce a hierarchical model for learning representations of full sized color images from the pixel level, removing the need for engineered feature representations and subsequent feature selection. We benchmark our model on the STL-10 recognition dataset, achieving state-of-the-art performance. When our features are combined with TagProp (Guillaumin et al.), we outperform or compete with existing annotation approaches that use over a dozen distinct image descriptors. Furthermore, using 256-bit codes and Hamming distance for training TagProp, we exchange only a small reduction in performance for efficient storage and fast comparisons. In our experiments, using deeper architectures always outperform shallow ones.


Between Instruction and Reward: Human-Prompted Switching

AAAI Conferences

Intelligent systems promise to amplify, augment, and extend innate human abilities. A principal example is that of assistive rehabilitation robots---artificial intelligence and machine learning enable new electromechanical systems that restore biological functions lost through injury or illness. In order for an intelligent machine to assist a human user, it must be possible for a human to communicate their intentions and preferences to their non-human counterpart. While there are a number of techniques that a human can use to direct a machine learning system, most research to date has focused on the contrasting strategies of instruction and reward. The primary contribution of our work is to demonstrate that the middle ground between instruction and reward is a fertile space for research and immediate technological progress. To support this idea, we introduce the setting of human-prompted switching, and illustrate the successful combination of switching with interactive learning using a concrete real-world example: human control of a multi-joint robot arm. We believe techniques that fall between the domains of instruction and reward are complementary to existing approaches, and will open up new lines of rapid progress for interactive human training of machine learning systems.


Telling Interactive Player-specific Stories and Planning for It: ASD + PaSSAGE = PAST

AAAI Conferences

Around the same time, a system called Player-Specific From Shakespeare's "Romeo and Juliet" to George Lucas' Stories via Automatically Generated Events (PaSSAGE) "Star Wars" to BioWare's "Jade Empire" to campfire stories (Thue et al. 2007) was proposed, which used AI techniques to baseball commentary, story-telling is a fundamental to model the player as he/she experiences a narrative-rich part of entertainment. A strong narrative resonates with our video game. Such a continuously updated player model was minds, hearts and souls and keeps us engaged. We remember used to dynamically adapt the story, tailoring it to the current the stories of our childhood and retell them to our own player. Unlike, ASD, PaSSAGE did not have any automation children. Story-telling has delighted and saddened the human at the design stage and relied on a human designer to race since the beginning of time and shows no signs of foresee all possible ways of a player breaking the story and slowing down. But can it be improved with technology?


Incorporating Search Algorithms into RTS Game Agents

AAAI Conferences

Real-time strategy (RTS) games are known to be one of the most complex game genres for humans to play, as well as one of the most difficult games for computer AI agents to play well. To tackle the task of applying AI to RTS games, recent techniques have focused on a divide-and-conquer approach, splitting the game into strategic components, and developing separate systems to solve each. This trend gives rise to a new problem: how to tie these systems together into a functional real-time strategy game playing agent. In this paper we discuss the architecture of UAlbertaBot, our entry into the 2011/2012 StarCraft AI competitions, and the techniques used to include heuristic search based AI systems for the intelligent automation of both build order planning and unit control for combat scenarios.


Procedural Game Adaptation: Framing Experience Management as Changing an MDP

AAAI Conferences

In this paper, we present the Procedural Game Adaptation (PGA) framework: a designer-controlled way to adapt the Changing the dynamics of a video game (i.e., how the dynamics of a given video game during end-user play. When player's actions affect the game world) is a fundamental tool implemented, this framework produces a deterministic, online of video game design. In Pac-Man, eating a power pill allows adaptation agent (called an experience manager (Riedl the player to temporarily defeat the ghosts that pursue et al. 2011)) that automatically performs two tasks: 1) it and threaten her for the vast majority of the game; in Call gathers information about a game's current player, 2) it of Duty 4, taking the perk called "Deep Impact" allows the uses that information to estimate which of several different player's bullets to pass through certain walls without being changes to the game's dynamics will maximize some playerspecific stopped. The parameters of such changes (e.g., how much value (e.g., fun, sense of influence, etc.). the ghosts slow down while vulnerable) are usually determined by the game's designers long before its release, with


Enhancing the Believability of Character Behaviors Using Non-Verbal Cues

AAAI Conferences

Characters are vital to large video game worlds as they bring a sense of life to the world. However, background characters are known to rarely exhibit any sign of motivated behavior or emotional state. We want to change this by assigning these characters emotions that can be identified through their non-verbal behavior. We feel the addition of emotion will allow players to feel more connected to the game world and make the game world more believable. This paper presents the results of an experiment to test two ways of conveying emotion: 1) through a character's gait and 2) through a character's interactions with the game world. Results from the experiment suggest that a combination of gait and interactions is the most effective method to convey emotion.


On Case Base Formation in Real-Time Heuristic Search

AAAI Conferences

Real-time heuristic search algorithms obey a constant limit on planning time per move. Agents using these algorithms can execute each move as it is computed, suggesting a strong potential for application to real-time video-game AI. Recently, a breakthrough in real-time heuristic search performance was achieved through the use of case-based reasoning. In this framework, the agent optimally solves a set of problems and stores their solutions in a case base. Then, given any new problem, it seeks a similar case in the case base and uses its solution as an aid to solve the problem at hand. A number of ad hoc approaches to the case base formation problem have been proposed and empirically shown to perform well. In this paper, we investigate a theoretically driven approach to solving the problem. We mathematically relate properties of a case base to the suboptimality of the solutions it produces and subsequently develop an algorithm that addresses these properties directly. An empirical evaluation shows our new algorithm outperforms the existing state of the art on contemporary video-game pathfinding benchmarks.


Sports Commentary Recommendation System (SCoReS): Machine Learning for Automated Narrative

AAAI Conferences

Automated sports commentary is a form of automated narrative. Sports commentary exists to keep the viewer informed and entertained. One way to entertain the viewer is by telling brief stories relevant to the game in progress. We introduce a system called the Sports Commentary Recommendation System (SCoReS) that can automatically suggest stories for commentators to tell during games. Through several user studies, we compared commentary using SCoReS to three other types of commentary and show that SCoReS adds significantly to the broadcast across several enjoyment metrics. We also collected interview data from professional sports commentators who positively evaluated a demonstration of the system. We conclude that SCoReS can be a useful broadcast tool, effective at selecting stories that add to the enjoyment and watchability of sports. SCoReS is a step toward automating sports commentary and, thus, automating narrative.