AITopics

In many situations, it is desirable to optimize a sequence of decisions by maximizing a primary objective while respecting some constraints with respect to secondary objectives. Such problems can be naturally modeled as constrained partially observable Markov decision processes (CPOMDPs) when the environment is partially observable. In this work, we describe a technique based on approximate linear programming to optimize policies in CPOMDPs. The optimization is performed offline and produces a finite state controller with desirable performance guarantees. The approach outperforms a constrained version of point-based value iteration on a suite of benchmark problems.

artificial intelligence, machine learning, objective, (18 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > Canada > Alberta (0.14)

Industry: Government > Regional Government > North America Government (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Improving Exploration in UCT Using Local Manifolds

Srinivasan, Sriram (University of Alberta) | Talvitie, Erik (Franklin and Marshal College) | Bowling, Michael (University of Alberta)

Monte-Carlo planning has been proven successful in manysequential decision-making settings, but it suffers from poorexploration when the rewards are sparse. In this paper, weimprove exploration in UCT by generalizing across similarstates using a given distance metric. We show that this algorithm,like UCT, converges asymptotically to the optimalaction. When the state space does not have a natural distancemetric, we show how we can learn a local manifold from thetransition graph of states in the near future. to obtain a distancemetric. On domains inspired by video games, empiricalevidence shows that our algorithm is more sample efficientthan UCT, particularly when rewards are sparse.

artificial intelligence, manifold, planning & scheduling, (20 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > Canada > Alberta (0.14)

Industry: Leisure & Entertainment > Games (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Solving Games with Functional Regret Estimation

Waugh, Kevin (Carnegie Mellon University) | Morrill, Dustin (University of Alberta) | Bagnell, James Andrew (Carnegie Mellon University) | Bowling, Michael (University of Alberta)

We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work onabstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.

abstraction, artificial intelligence, game theory, (18 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > Canada > Alberta (0.29)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Optimal Estimation of Multivariate ARMA Models

White, Martha (University of Alberta) | Wen, Junfeng (University of Alberta) | Bowling, Michael (University of Alberta) | Schuurmans, Dale (University of Alberta)

A central problem in applied data analysis is time series In this paper, we develop a tractable approach to maximum modeling--estimating and forecasting a discrete-time likelihood parameter estimation for stochastic multivariate stochastic process--for which the autoregressive moving ARMA models. To efficiently compute a globally average (ARMA) and stochastic ARMA (Thiesson et al. optimal estimate, the problem is re-expressed as a regularized 2012) are fundamental models. An ARMA model describes loss minimization, which then allows recent algorithmic the behavior of a linear dynamical system under advances in sparse estimation to be applied (Shah et al. latent Gaussian perturbations (Brockwell and Davis 2002; 2012; Candes et al. 2011; Bach, Mairal, and Ponce 2008; Lütkepohl 2007), which affords intuitive modeling capability, Zhang et al. 2011; White et al. 2012). Although there has efficient forecasting algorithms, and a close relationship been recent progress in global estimation for ARMA, such to linear Gaussian state-space models (Katayama 2006, approaches have either been restricted to single-input singleoutput pp.5-6).

arma model, artificial intelligence, machine learning, (19 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > Canada > Alberta (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.96)

Policy Tree: Adaptive Representation for Policy Gradient

Gupta, Ujjwal Das (University of Alberta) | Talvitie, Erik (Franklin and Marshall College) | Bowling, Michael (University of Alberta)

Much of the focus on finding good representations in reinforcement learning has been on learning complex non-linear predictors of value. Policy gradient algorithms, which directly represent the policy, often need fewer parameters to learn good policies. However, they typically employ a fixed parametric representation that may not be sufficient for complex domains. This paper introduces the Policy Tree algorithm, which can learn an adaptive representation of policy in the form of a decision tree over different instantiations of a base policy. Policy gradient is used both to optimize the parameters and to grow the tree by choosing splits that enable the maximum local increase in the expected return of the policy. Experiments show that this algorithm can choose genuinely helpful splits and significantly improve upon the commonly used linear Gibbs softmax policy, which we choose as our base policy.

algorithm, artificial intelligence, decision tree learning, (18 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

AAAI ConferencesMar-1-2015

Solving Games with Functional Regret Estimation

Waugh, Kevin (Carnegie Mellon University) | Morrill, Dustin (University of Alberta) | Bagnell, James Andrew (Carnegie Mellon University) | Bowling, Michael (University of Alberta)

We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work on abstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.

abstraction, artificial intelligence, game theory, (19 more...)

Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > Canada > Alberta (0.29)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

arXiv.org Artificial IntelligenceDec-31-2014

Solving Games with Functional Regret Estimation

Waugh, Kevin, Morrill, Dustin, Bagnell, J. Andrew, Bowling, Michael

We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work on abstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.

abstraction, artificial intelligence, game theory, (19 more...)

arXiv.org Artificial Intelligence

1411.7974

Country:

North America > Canada > Alberta (0.29)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

AAAI ConferencesJul-22-2014

Search in Imperfect Information Games Using Online Monte Carlo Counterfactual Regret Minimization

Lanctot, Marc (Maastricht University) | Lisy, Viliam (Czech Technical University in Prague) | Bowling, Michael (University of Alberta)

Online search in games has always been a core interest of artificial intelligence. Advances made in search for perfect information games (such as Chess, Checkers, Go, and Backgammon) have led to AI capable of defeating the world's top human experts. Search in imperfect information games (such as Poker, Bridge, and Skat) is significantly more challenging due to the complexities introduced by hidden information. In this paper, we present Online Outcome Sampling (OOS), the first imperfect information search algorithm that is guaranteed to converge to an equilibrium strategy in two-player zero-sum games. We show that OOS avoids common problems encountered by existing search algorithms and we experimentally evaluate its convergence rate and practical performance against benchmark strategies in Liar's Dice and a variant of Goofspiel. We show that unlike with Information Set Monte Carlo Tree Search (ISMCTS) the exploitability of the strategies produced by OOS decreases as the amount of search time increases. In practice, OOS performs as well as ISMCTS in head-to-head play while producing strategies with lower exploitability given the same search time.

imperfect information game, monte carlo counterfactual regret minimization

Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.73)

AAAI ConferencesJul-14-2014

Using Response Functions to Measure Strategy Strength

Davis, Trevor (University of Alberta) | Burch, Neil (University of Alberta) | Bowling, Michael (University of Alberta)

Extensive-form games are a powerful tool for representing complex multi-agent interactions. Nash equilibrium strategies are commonly used as a solution concept for extensive-form games, but many games are too large for the computation of Nash equilibria to be tractable. In these large games, exploitability has traditionally been used to measure deviation from Nash equilibrium, and thus strategies are aimed to achieve minimal exploitability. However, while exploitability measures a strategy's worst-case performance, it fails to capture how likely that worst-case is to be observed in practice. In fact, empirical evidence has shown that a less exploitable strategy can perform worse than a more exploitable strategy in one-on-one play against a variety of opponents. In this work, we propose a class of response functions that can be used to measure the strength of a strategy. We prove that standard no-regret algorithms can be used to learn optimal strategies for a scenario where the opponent uses one of these response functions. We demonstrate the effectiveness of this technique in Leduc Hold'em against opponents that use the UCT Monte Carlo tree search algorithm.

artificial intelligence, game theory, response function, (17 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country: North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

AAAI ConferencesJul-14-2014

Solving Imperfect Information Games Using Decomposition

Burch, Neil (University of Alberta) | Johanson, Michael (University of Alberta) | Bowling, Michael (University of Alberta)

Decomposition, i.e. independently analyzing possible subgames, has proven to be an essential principle for effective decision-making in perfect information games. However, in imperfect information games, decomposition has proven to be problematic. To date, all proposed techniques for decomposition in imperfect information games have abandoned theoretical guarantees. This work presents the first technique for decomposing an imperfect information game into subgames that can be solved independently, while retaining optimality guarantees on the full-game solution. We can use this technique to construct theoretically justified algorithms that make better use of information available at run-time, overcome memory or disk limitations at run-time, or make a time/space trade-off to overcome memory or disk limitations while solving a game. In particular, we present an algorithm for subgame solving which guarantees performance in the whole game, in contrast to existing methods which may have unbounded error. In addition, we present an offline game solving algorithm, CFR-D, which can produce a Nash equilibrium for a game that is larger than available storage.

artificial intelligence, game theory, subgame, (18 more...)

Twenty-Eighth AAAI Conference on Artificial Intelligence

Country: North America > Canada > Alberta (0.14)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)