AITopics | Bowling, Michael

Collaborating Authors

Bowling, Michael

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning

Foerster, Jakob N., Song, Francis, Hughes, Edward, Burch, Neil, Dunning, Iain, Whiteson, Shimon, Botvinick, Matthew, Bowling, Michael

arXiv.org Artificial IntelligenceNov-4-2018

When observing the actions of others, humans carry out inferences about why the others acted as they did, and what this implies about their view of the world. Humans also use the fact that their actions will be interpreted in this manner when observed by others, allowing them to act informatively and thereby communicate efficiently with others. Although learning algorithms have recently achieved superhuman performance in a number of two-player, zero-sum games, scalable multi-agent reinforcement learning algorithms that can discover effective strategies and conventions in complex, partially observable settings have proven elusive. We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment. Together with the public belief, this Bayesian update effectively defines a new Markov decision process, the public belief MDP, in which the action space consists of deterministic partial policies, parameterised by deep neural networks, that can be sampled for a given public state. It exploits the fact that an agent acting only on this public belief state can still learn to use its private information if the action space is augmented to be over partial policies mapping private information into environment actions. The Bayesian update is also closely related to the theory of mind reasoning that humans carry out when observing others' actions. We first validate BAD on a proof-of-principle two-step matrix game, where it outperforms traditional policy gradient methods. We then evaluate BAD on the challenging, cooperative partial-information card game Hanabi, where in the two-player setting the method surpasses all previously published learning and hand-coded approaches.

agent, deep learning, neural network, (18 more...)

arXiv.org Artificial Intelligence

1811.01458

Genre: Research Report (0.83)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Generalization and Regularization in DQN

Farebrother, Jesse, Machado, Marlos C., Bowling, Michael

arXiv.org Artificial IntelligenceSep-28-2018

Deep reinforcement learning (RL) algorithms have shown an impressive ability to learn complex control policies in high-dimensional environments. However, despite the ever-increasing performance on popular benchmarks like the Arcade Learning Environment (ALE), policies learned by deep RL algorithms can struggle to generalize when evaluated in remarkably similar environments. These results are unexpected given the fact that, in supervised learning, deep neural networks often learn robust features that generalize across tasks. In this paper, we study the generalization capabilities of DQN in order to aid in understanding this mismatch between generalization in deep RL and supervised learning methods. We provide evidence suggesting that DQN overspecializes to the domain it is trained on. We then comprehensively evaluate the impact of traditional methods of regularization from supervised learning, $\ell_2$ and dropout, and of reusing learned representations to improve the generalization capabilities of DQN. We perform this study using different game modes of Atari 2600 games, a recently introduced modification for the ALE which supports slight variations of the Atari 2600 games used for benchmarking in the field. Despite regularization being largely underutilized in deep RL, we show that it can, in fact, help DQN learn more general features. These features can then be reused and fine-tuned on similar tasks, considerably improving the sample efficiency of DQN.

computer game, deep learning, regularization, (22 more...)

arXiv.org Artificial Intelligence

1810.00123

Country: North America > Canada > Alberta (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games > Computer Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Solving Large Extensive-Form Games with Strategy Constraints

Davis, Trevor, Waugh, Kevin, Bowling, Michael

arXiv.org Artificial IntelligenceSep-20-2018

Extensive-form games are a common model for multiagent interactions with imperfect information. In two-player zero-sum games, the typical solution concept is a Nash equilibrium over the unconstrained strategy set for each player. In many situations, however, we would like to constrain the set of possible strategies. For example, constraints are a natural way to model limited resources, risk mitigation, safety, consistency with past observations of behavior, or other secondary objectives for an agent. In small games, optimal strategies under linear constraints can be found by solving a linear program; however, state-of-the-art algorithms for solving large games cannot handle general constraints. In this work we introduce a generalized form of Counterfactual Regret Minimization that provably finds optimal strategies under any feasible set of convex constraints. We demonstrate the effectiveness of our algorithm for finding strategies that mitigate risk in security games, and for opponent modeling in poker games when given only partial observations of private information.

computer game, constraint, game theory, (18 more...)

arXiv.org Artificial Intelligence

1809.07893

Country: North America > Canada > Alberta (0.14)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.55)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines

Schmid, Martin, Burch, Neil, Lanctot, Marc, Moravcik, Matej, Kadlec, Rudolf, Bowling, Michael

arXiv.org Artificial IntelligenceSep-9-2018

Learning strategies for imperfect information games from samples of interaction is a challenging problem. A common method for this setting, Monte Carlo Counterfactual Regret Minimization (MCCFR), can have slow long-term convergence rates due to high variance. In this paper, we introduce a variance reduction technique (VR-MCCFR) that applies to any sampling variant of MCCFR. Using this technique, per-iteration estimated values and updates are reformulated as a function of sampled values and state-action baselines, similar to their use in policy gradient reinforcement learning. The new formulation allows estimates to be bootstrapped from other estimates within the same episode, propagating the benefits of baselines along the sampled trajectory; the estimates remain unbiased even when bootstrapping from other estimates. Finally, we show that given a perfect baseline, the variance of the value estimates can be reduced to zero. Experimental evaluation shows that VR-MCCFR brings an order of magnitude speedup, while the empirical variance decreases by three orders of magnitude. The decreased variance allows for the first time CFR+ to be used with sampling, increasing the speedup to two orders of magnitude.

artificial intelligence, baseline, game theory, (17 more...)

arXiv.org Artificial Intelligence

1809.03057

Country: North America > Canada > Alberta (0.28)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Count-Based Exploration with the Successor Representation

Machado, Marlos C., Bellemare, Marc G., Bowling, Michael

arXiv.org Artificial IntelligenceJul-30-2018

The problem of exploration in reinforcement learning is well-understood in the tabular case and many sample-efficient algorithms are known. Nevertheless, it is often unclear how the algorithms in the tabular setting can be extended to tasks with large state-spaces where generalization is required. Recent promising developments generally depend on problem-specific density models or handcrafted features. In this paper we introduce a simple approach for exploration that allows us to develop theoretically justified algorithms in the tabular case but that also give us intuitions for new algorithms applicable to settings where function approximation is required. Our approach and its underlying theory is based on the substochastic successor representation, a concept we develop here. While the traditional successor representation is a representation that defines state generalization by the similarity of successor states, the substochastic successor representation is also able to implicitly count the number of times each state (or feature) has been observed. This extension connects two until now disjoint areas of research. We show in traditional tabular domains (RiverSwim and SixArms) that our algorithm empirically performs as well as other sample-efficient algorithms. We then describe a deep reinforcement learning algorithm inspired by these ideas and show that it matches the performance of recent pseudo-count-based methods in hard exploration Atari 2600 games.

algorithm, artificial intelligence, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1807.11622

Country: North America > Canada > Alberta (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces

Holland, G. Zacharias, Talvitie, Erik, Bowling, Michael

arXiv.org Artificial IntelligenceJun-8-2018

Dyna is an architecture for reinforcement learning agents that interleaves planning, acting, and learning in an online setting. This architecture aims to make fuller use of limited experience to achieve better performance with fewer environmental interactions. Dyna has been well studied in problems with a tabular representation of states, and has also been extended to some settings with larger state spaces that require function approximation. However, little work has studied Dyna in environments with high-dimensional state spaces like images. In Dyna, the environment model is typically used to generate one-step transitions from selected start states. We applied one-step Dyna to several games from the Arcade Learning Environment and found that the model-based updates offered surprisingly little benefit, even with a perfect model. However, when the model was used to generate longer trajectories of simulated experience, performance improved dramatically. This observation also holds when using a model that is learned from experience; even though the learned model is flawed, it can still be used to accelerate learning.

computer game, deep learning, neural network, (21 more...)

arXiv.org Artificial Intelligence

1806.01825

Country:

North America > Canada > Alberta (0.29)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.34)
Leisure & Entertainment > Games > Computer Games (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

AIVAT: A New Variance Reduction Technique for Agent Evaluation in Imperfect Information Games

Burch, Neil (University of Alberta) | Schmid, Martin (Charles University in Prague) | Moravcik, Matej (Charles University in Prague) | Morill, Dustin (University of Alberta) | Bowling, Michael (University of Alberta)

AAAI ConferencesFeb-8-2018

Evaluating agent performance when outcomes are stochastic and agents use randomized strategies can be challenging when there is limited data available. The variance of sampled outcomes may make the simple approach of Monte Carlo sampling inadequate. This is the case for agents playing heads-up no-limit Texas hold'em poker, whereman-machine competitions typically involve multiple days of consistent play by multiple players, but still can (and sometimes did) result in statistically insignificant conclusions. In this paper, we introduce AIVAT, a low variance, provably unbiased value assessment tool that exploits an arbitrary heuristic estimate of state value, as well as the explicit strategy of a subset of the agents. Unlike existing techniques which reduce the variance from chance events, or only consider game ending actions, AIVAT reduces the variance both from choices by nature and by players with a known strategy. The resulting estimator produces results that significantly outperform previous state of the art techniques. It was able to reduce the standard deviation of a Texas hold'em poker man-machine match by 85\% and consequently requires 44 times fewer games to draw the same statistical conclusion. AIVAT enabled the first statistically significant AI victory against professional poker players in no-limit hold'em.Furthermore, the technique was powerful enough to produce statistically significant results versus individual players, not just an aggregate pool of the players. We also used AIVAT to analyze a short series of AI vs human poker tournaments,producing statistical significant results with as few as 28 matches.

artificial intelligence, game theory, imaginary observation, (19 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States > Texas (0.45)

Genre: Research Report > Experimental Study (0.48)

Industry: Leisure & Entertainment > Games > Poker (0.88)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Games > Poker (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.47)

Add feedback

AIVAT: A New Variance Reduction Technique for Agent Evaluation in Imperfect Information Games

Burch, Neil (University of Alberta) | Schmid, Martin (University of Alberta) | Moravcik, Matej (University of Alberta) | Bowling, Michael (University of Alberta)

AAAI ConferencesFeb-4-2017

artificial intelligence, game theory, imaginary observation, (17 more...)

AAAI Conferences

Workshops at the Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States > Texas (0.25)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Games (0.47)

Add feedback

Eqilibrium Approximation Quality of Current No-Limit Poker Bots

Lisy, Viliam (Czech Technical University in Prague) | Bowling, Michael (University of Alberta, Canada)

AAAI ConferencesFeb-4-2017

Approximating a Nash equilibrium is currently the best performing approach for creating poker-playing programs. While for the simplest variants of the game, it is possible to evaluate the quality of the approximation by computing the value of the best response strategy, this is currently not computationally feasible for larger variants of the game, such as heads-up no-limit Texas hold'em. In this paper, we present a simple and computationally inexpensive Local Best Response method for computing an approximate lower bound on the value of the best response strategy. Using this method, we show that existing poker-playing programs, based on solving abstract games, are remarkably poor Nash equilibrium approximations.

abstraction, artificial intelligence, opponent, (18 more...)

AAAI Conferences

Workshops at the Thirty-First AAAI Conference on Artificial Intelligence

Country:

North America > United States > Texas (0.25)
North America > Canada > Alberta (0.15)

Industry: Leisure & Entertainment > Games > Poker (1.00)

Technology: Information Technology > Artificial Intelligence > Games > Poker (1.00)

Add feedback

The Forget-me-not Process

Milan, Kieran, Veness, Joel, Kirkpatrick, James, Bowling, Michael, Koop, Anna, Hassabis, Demis

Neural Information Processing SystemsDec-31-2016

We introduce the Forget-me-not Process, an efficient, non-parametric meta-algorithm for online probabilistic sequence prediction for piecewise stationary, repeating sources. Our method works by taking a Bayesian approach to partition a stream of data into postulated task-specific segments, while simultaneously building a model for each task. We provide regret guarantees with respect to piecewise stationary data sources under the logarithmic loss, and validate the method empirically across a range of sequence prediction and task identification problems.

artificial intelligence, base model, bayesian inference, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.14)
North America > Canada > Alberta (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback