AITopics

Twenty-Seventh AAAI Conference on Artificial Intelligence

Country:

North America > United States (0.28)
Asia > Middle East > Israel (0.14)

Genre: Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

AAAI ConferencesJul-21-2012

Computing Optimal Strategies to Commit to in Stochastic Games

Letchford, Joshua (Duke University) | MacDermed, Liam (Georgia Institute of Technology) | Conitzer, Vincent (Duke University) | Parr, Ronald (Duke University) | Isbell, Charles L. (Georgia Institute of Technology)

Significant progress has been made recently in the following two lines of research in the intersection of AI and game theory: (1) the computation of optimal strategies to commit to (Stackelberg strategies), and (2) the computation of correlated equilibria of stochastic games. In this paper, we unite these two lines of research by studying the computation of Stackelberg strategies in stochastic games. We provide theoretical results on the value of being able to commit and the value of being able to correlate, as well as complexity results about computing Stackelberg strategies in stochastic games. We then modify the QPACE algorithm (MacDermed et al. 2011) to compute Stackelberg strategies, and provide experimental results.

artificial intelligence, game theory, player 1, (19 more...)

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country: North America > United States (0.68)

Industry: Leisure & Entertainment > Games (0.89)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

arXiv.org Machine LearningJun-27-2012

Greedy Algorithms for Sparse Reinforcement Learning

Painter-Wakefield, Christopher, Parr, Ronald

Feature selection and regularization are becoming increasingly prominent tools in the efforts of the reinforcement learning (RL) community to expand the reach and applicability of RL. One approach to the problem of feature selection is to impose a sparsity-inducing form of regularization on the learning method. Recent work on $L_1$ regularization has adapted techniques from the supervised learning literature for use with RL. Another approach that has received renewed attention in the supervised learning community is that of using a simple algorithm that greedily adds new features. Such algorithms have many of the good properties of the $L_1$ regularization methods, while also being extremely efficient and, in some cases, allowing theoretical guarantees on recovery of the true form of a sparse target function from sampled data. This paper considers variants of orthogonal matching pursuit (OMP) applied to reinforcement learning. The resulting algorithms are analyzed and compared experimentally with existing $L_1$ regularized approaches. We demonstrate that perhaps the most natural scenario in which one might hope to achieve sparse recovery fails; however, one variant, OMP-BRM, provides promising theoretical guarantees under certain assumptions on the feature dictionary. Another variant, OMP-TD, empirically outperforms prior methods both in approximation accuracy and efficiency on several benchmark problems.

algorithm, artificial intelligence, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1206.6485

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

AAAI ConferencesAug-4-2011

Non-Parametric Approximate Linear Programming for MDPs

Pazis, Jason (Duke University) | Parr, Ronald (Duke University)

The Approximate Linear Programming (ALP) approach to value function approximation for MDPs is a parametric value function approximation method, in that it represents the value function as a linear combination of features which are chosen a priori. Choosing these features can be a difficult challenge in itself. One recent effort, Regularized Approximate Linear Programming (RALP), uses L1 regularization to address this issue by combining a large initial set of features with a regularization penalty that favors a smooth value function with few non-zero weights. Rather than using smoothness as a backhanded way of addressing the feature selection problem, this paper starts with smoothness and develops a non-parametric approach to ALP that is consistent with the smoothness assumption. We show that this new approach has some favorable practical and analytical properties in comparison to (R)ALP.

constraint, fuzzy logic, optimization problem, (17 more...)

Twenty-Fifth AAAI Conference on Artificial Intelligence

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.55)

Neural Information Processing SystemsDec-31-2010

Linear Complementarity for Regularized Policy Evaluation and Improvement

Johns, Jeffrey, Painter-wakefield, Christopher, Parr, Ronald

Recent work in reinforcement learning has emphasized the power of L1 regularization to perform feature selection and prevent overfitting. We propose formulating the L1 regularized linear fixed point problem as a linear complementarity problem (LCP). This formulation offers several advantages over the LARS-inspired formulation, LARS-TD. The LCP formulation allows the use of efficient off-the-shelf solvers, leads to a new uniqueness result, and can be initialized with starting points from similar problems (warm starts). We demonstrate that warm starts, as well as the efficiency of LCP solvers, can speed up policy iteration. Moreover, warm starts permit a form of modified policy iteration that can be used to approximate a greedy" homotopy path, a generalization of the LARS-TD homotopy path that combines policy evaluation and optimization."

algorithm, artificial intelligence, optimization problem, (16 more...)

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

AAAI ConferencesJul-15-2010

Complexity of Computing Optimal Stackelberg Strategies in Security Resource Allocation Games

Korzhyk, Dmytro (Duke University) | Conitzer, Vincent (Duke University) | Parr, Ronald (Duke University)

Recently, algorithms for computing game-theoretic solutions have been deployed in real-world security applications, such as the placement of checkpoints and canine units at Los Angeles International Airport. These algorithms assume that the defender (security personnel) can commit to a mixed strategy, a so-called Stackelberg model. As pointed out by Kiekintveld et al. (2009), in these applications, generally, multiple resources need to be assigned to multiple targets, resulting in an exponential number of pure strategies for the defender. In this paper, we study how to compute optimal Stackelberg strategies in such games, showing that this can be done in polynomial time in some cases, and is NP-hard in others.

air transportation, game theory, probability, (17 more...)

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country: North America > United States (0.46)

Industry:

Transportation > Air (0.87)
Leisure & Entertainment > Games (0.69)
Transportation > Infrastructure & Services > Airport (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Neural Information Processing SystemsDec-31-2006

Hierarchical Linear/Constant Time SLAM Using Particle Filters for Dense Maps

Eliazar, Austin I., Parr, Ronald

We present an improvement to the DP-SLAM algorithm for simultaneous localizationand mapping (SLAM) that maintains multiple hypotheses about densely populated maps (one full map per particle in a particle filter)in time that is linear in all significant algorithm parameters and takes constant (amortized) time per iteration. This means that the asymptotic complexity of the algorithm is no greater than that of a pure localization algorithm using a single map and the same number of particles. Wealso present a hierarchical extension of DP-SLAM that uses a two level particle filter which models drift in the particle filtering process itself. The hierarchical approach enables recovery from the inevitable drift that results from using a finite number of particles in a particle filter and permits the use of DP-SLAM in more challenging domains, while maintaining linear time asymptotic complexity.

artificial intelligence, machine learning, particle, (18 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsDec-31-2003

Learning in Zero-Sum Team Markov Games Using Factored Value Functions

Lagoudakis, Michail G., Parr, Ronald

We present a new method for learning good strategies in zero-sum Markov games in which each side is composed of multiple agents collaborating against an opposing team of agents. Our method requires full observability and communication during learning, but the learned policies can be executed in a distributed manner. The value function is represented as a factored linear architecture and its structure determines the necessary computational resources and communication bandwidth. This approach permits a tradeoff between simple representations with little or no communication between agents and complex, computationally intensive representations with extensive coordination between agents. Thus, we provide a principled means of using approximation to combat the exponential blowup in the joint action space of the participants. The approach is demonstrated with an example that shows the efficiency gains over naive enumeration.

artificial intelligence, constraint, machine learning, (16 more...)

Country:

North America > Canada (0.29)
North America > United States > California > San Francisco County > San Francisco (0.14)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)

Neural Information Processing SystemsDec-31-2003

Learning in Zero-Sum Team Markov Games Using Factored Value Functions

Lagoudakis, Michail G., Parr, Ronald

We present a new method for learning good strategies in zero-sum Markov games in which each side is composed of multiple agents collaborating againstan opposing team of agents. Our method requires full observability and communication during learning, but the learned policies canbe executed in a distributed manner. The value function is represented asa factored linear architecture and its structure determines the necessary computational resources and communication bandwidth. This approach permits a tradeoff between simple representations with little or no communication between agents and complex, computationally intensive representationswith extensive coordination between agents. Thus, we provide a principled means of using approximation to combat the exponential blowup in the joint action space of the participants. The approach isdemonstrated with an example that shows the efficiency gains over naive enumeration.

artificial intelligence, constraint, machine learning, (15 more...)

Country:

North America > Canada (0.29)
North America > United States > California > San Francisco County > San Francisco (0.14)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.94)

Neural Information Processing SystemsDec-31-2002

Multiagent Planning with Factored MDPs

Guestrin, Carlos, Koller, Daphne, Parr, Ronald

We present a principled and efficient planning algorithm for cooperative multiagent dynamic systems. A striking feature of our method is that the coordination and communication between the agents is not imposed, but derived directly from the system dynamics and function approximation architecture. We view the entire multiagent system as a single, large Markov decision process (MDP), which we assume can be represented in a factored way using a dynamic Bayesian network (DBN). The action space of the resulting MDP is the joint action space of the entire set of agents. Our approach is based on the use of factored linear value functions as an approximation to the joint value function.

artificial intelligence, machine learning, value function, (18 more...)

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)