Goto

Collaborating Authors

 University of Alberta


The First microRTS Artificial Intelligence Competition

AI Magazine

This article presents the results of the first edition of the microRTS (μRTS) AI competition, which was hosted by the IEEE Computational Intelligence in Games (CIG) 2017 conference. The goal of the competition is to spur research on AI techniques for real-time strategy (RTS) games. In this first edition, the competition received three submissions, focusing on address- ing problems such as balancing long-term and short-term search, the use of machine learning to learn how to play against certain opponents, and finally, dealing with partial observability in RTS games.


Memory-Augmented Monte Carlo Tree Search

AAAI Conferences

This paper proposes and evaluates Memory-Augmented Monte Carlo Tree Search (M-MCTS), which provides a new approach to exploit generalization in online real-time search. The key idea of M-MCTS is to incorporate MCTS with a memory structure, where each entry contains information of a particular state. This memory is used to generate an approximate value estimation by combining the estimations of similar states. We show that the memory based value approximation is better than the vanilla Monte Carlo estimation with high probability under mild conditions. We evaluate M-MCTS in the game of Go. Experimental results show that M-MCTS outperforms the original MCTS with the same number of simulations.


Multi-Step Reinforcement Learning: A Unifying Algorithm

AAAI Conferences

Unifying seemingly disparate algorithmic ideas to produce better performing algorithms has been a longstanding goal in reinforcement learning. As a primary example, TD(λ) elegantly unifies one-step TD prediction with Monte Carlo methods through the use of eligibility traces and the trace-decay parameter. Currently, there are a multitude of algorithms that can be used to perform TD control, including Sarsa, Q-learning, and Expected Sarsa. These methods are often studied in the one-step case, but they can be extended across multiple time steps to achieve better performance. Each of these algorithms is seemingly distinct, and no one dominates the others for all problems. In this paper, we study a new multi-step action-value algorithm called Q(σ) that unifies and generalizes these existing algorithms, while subsuming them as special cases. A new parameter, σ, is introduced to allow the degree of sampling performed by the algorithm at each step during its backup to be continuously varied, with Sarsa existing at one extreme (full sampling), and Expected Sarsa existing at the other (pure expectation). Q(σ) is generally applicable to both on- and off-policy learning, but in this work we focus on experiments in the on-policy case. Our results show that an intermediate value of σ, which results in a mixture of the existing algorithms, performs better than either extreme. The mixture can also be varied dynamically which can result in even greater performance.


AIVAT: A New Variance Reduction Technique for Agent Evaluation in Imperfect Information Games

AAAI Conferences

Evaluating agent performance when outcomes are stochastic and agents use randomized strategies can be challenging when there is limited data available. The variance of sampled outcomes may make the simple approach of Monte Carlo sampling inadequate. This is the case for agents playing heads-up no-limit Texas hold'em poker, whereman-machine competitions typically involve multiple days of consistent play by multiple players, but still can (and sometimes did) result in statistically insignificant conclusions. In this paper, we introduce AIVAT, a low variance, provably unbiased value assessment tool that exploits an arbitrary heuristic estimate of state value, as well as the explicit strategy of a subset of the agents. Unlike existing techniques which reduce the variance from chance events, or only consider game ending actions, AIVAT reduces the variance both from choices by nature and by players with a known strategy. The resulting estimator produces results that significantly outperform previous state of the art techniques. It was able to reduce the standard deviation of a Texas hold'em poker man-machine match by 85\% and consequently requires 44 times fewer games to draw the same statistical conclusion. AIVAT enabled the first statistically significant AI victory against professional poker players in no-limit hold'em.Furthermore, the technique was powerful enough to produce statistically significant results versus individual players, not just an aggregate pool of the players. We also used AIVAT to analyze a short series of AI vs human poker tournaments,producing statistical significant results with as few as 28 matches.


Deep Learning for Speech Accent Detection in Videogames

AAAI Conferences

In video games, a wide range of characters make up the world players inhabit. These characters, NPCs, have traits, such as their appearance and speech accent, that determine certain things about them, including moral inclination, levels of trustworthiness, social class, levels of education, and ethnic background. But what does an accent say about a character in a video game? We use deep learning to train a neural network to detect speech accents and establish the degree to which machines can be used to recognize these accents. This research aims to help sociolinguists and discourse analysts establish critical study and content analytical findings for instance about stereotypical uses of speech accents, to better analyze who has what accent in video games, and what kind of language ideologies and social value judgments the use of accents in games construct and perpetuate. This paper presents the results of the first deep learning experiments, which were conducted on Standard North American, British Received Pronunciation, and Spanish English. We discuss our methodological considerations and some early deep learning results, which show relatively low levels of accuracy (61%). We discuss possibilities of improving our method, and of enriching our training datasets.


Robustness of Real-Time Heuristic Search Algorithms to Read/Write Error in Externally Stored Heuristics

AAAI Conferences

Real-time heuristic search algorithms follow the agent-centered search paradigm wherein the agent has access only to information local to the agent’s current position in the environment. This allows agents with constant-bounded computational faculties (e.g., memory) to take on search problems of progressively increasing sizes. As the agent’s memory does not scale with the size of the search problem, the heuristic must necessarily be stored externally, in the environment. Storing the heuristic in the environment brings the extra challenge of read/write errors. In video games, introducing error artificially to the heuristics can make the non-player characters (NPC) behave more naturally. In this paper, we evaluate effects of such errors on real-time heuristic search algorithms. In particular, we empirically study the effects of heuristic read redundancy on algorithm performance and compare its effects to the existing technique of using weights in heuristic learning. Finally, we evaluate a recently proposed technique of correcting the heuristic with a one-step error term in the presence of read/write error.


Combining Strategic Learning with Tactical Search in Real-Time Strategy Games

AAAI Conferences

A commonly used technique for managing AI complexity in real-time strategy (RTS) games is to use action and/or state abstractions. High-level abstractions can often lead to good strategic decision making, but tactical decision quality may suffer due to lost details. A competing method is to sample the search space which often leads to good tactical performance in simple scenarios, but poor high-level planning. We propose to use a deep convolutional neural network (CNN) to select among a limited set of abstract action choices, and to utilize the remaining computation time for game tree search to improve low level tactics. The CNN is trained by supervised learning on game states labelled by Puppet Search, a strategic search algorithm that uses action abstractions. The network is then used to select a script — an abstract action -— to produce low level actions for all units. Subsequently, the game tree search algorithm improves the tactical actions of a subset of units using a limited view of the game state only considering units close to opponent units. Experiments in the microRTS game show that the combined algorithm results in higher win-rates than either of its two independent components and other state-of-the-art microRTS agents. To the best of our knowledge, this is the first successful application of a convolutional network to play a full RTS game on standard game maps, as previous work has focused on sub-problems, such as combat, or on very small maps.


Towards Positively Surprising Non-Player Characters in Video Games

AAAI Conferences

Video games often populate their in-game world with numerous ambient non-playable characters. Manually crafting interesting behaviors for such characters can be prohibitively expensive. As scripted AI gets re-used across multiple characters, they can appear overly similar, shallow and generally uninteresting for the player to interact with. In this paper we propose to evolve interesting behaviors in a simulated evolutionary environment. Since only some evolution runs may give rise to such behaviors, we propose to train deep neural networks to detect such behaviors. The paper presents work in progress in this direction.


Deep Learning for Real-Time Heuristic Search Algorithm Selection

AAAI Conferences

Real-time heuristic search algorithms are used for creating agents that rely on local information and move in a bounded amount of time making them an excellent candidate for video games as planning time can be controlled. Path finding on video game maps has become the de facto standard for evaluating real-time heuristic search algorithms. Over the years researchers have worked to identify areas where these algorithms perform poorly in an attempt to mitigate their weaknesses. Recent work illustrates the benefits of tailoring algorithms for a given problem as performance is heavily dependent on the search space. In order to determine which algorithm to select for solving the search problems on a map the developer would have to run all the algorithms in consideration to obtain the correct choice. Our work extends the previous algorithm selection approach to use a deep learning classifier to select the algorithm to use on new maps without having to evaluate the algorithms on the map. To do so we select a portfolio of algorithms and train a classifier to predict which portfolio member to use on a unseen new map. Our empirical results show that selecting algorithms dynamically can outperform the single best algorithm from the portfolio on new maps, as well provide the lower bound for potential improvements to motivate further work on this approach.


Effects of Self-Knowledge: Once Bitten Twice Shy

AAAI Conferences

Procedurally generating rich, naturally behaving AI-controlled video game characters is an important open problem. In this paper we focus on a particular aspect of non-playable character (NPC) behavior, long favored by science-fiction writers. Specifically, we study the effects of self-knowledge on NPC behavior. To do so we adopt the well-known framework of agent-centered real-time heuristic search applied to the standard pathfinding task on video-game maps. Such search agents normally use a heuristic function to guide them around a map to the goal state. Heuristic functions are inaccurate underestimates of the remaining distance to goal. What if the agent somehow knew how long it (the agent) would actually take to reach the goal from each state? How would using such self-knowledge in place of a heuristic function affect the agent's behavior? We show that similarly to real life, knowing of one's irrational behavior in a situation can deter the agent from getting into that situation again even if it is, in fact, a part of an optimal solution. We demonstrate the "fear" with a simple example and empirically show that the issue is common in video-game pathfinding. We then analyze the issue theoretically and suggest that "fear" induced by self-knowledge is not a bug but a feature and may potentially be used to develop more naturally behaving NPCs.