Agents
Policy-Gradient Methods for Planning
Probabilistic temporal planning attempts to find good policies for acting in domains with concurrent durative tasks, multiple uncertain outcomes, and limited resources. These domains are typically modelled as Markov decision problems and solved using dynamic programming methods. This paper demonstrates the application of reinforcement learning -- in the form of a policy-gradient method -- to these domains. Our emphasis is large domains that are infeasible for dynamic programming. Our approach is to construct simple policies, or agents, for each planning task. The result is a general probabilistic temporal planner, named the Factored Policy-Gradient Planner (FPG-Planner), which can handle hundreds of tasks, optimising for probability of success, duration, and resource use.
Cyclic Equilibria in Markov Games
Zinkevich, Martin, Greenwald, Amy, Littman, Michael L.
Although variants of value iteration have been proposed for finding Nash or correlated equilibria in general-sum Markov games, these variants have not been shown to be effective in general. In this paper, we demonstrate by construction that existing variants of value iteration cannot find stationary equilibrium policies in arbitrary general-sum Markov games. Instead, we propose an alternative interpretation of the output of value iteration based on a new (non-stationary) equilibrium concept that we call "cyclic equilibria." We prove that value iteration identifies cyclic equilibria in a class of games in which it fails to find stationary equilibria. We also demonstrate empirically that value iteration finds cyclic equilibria in nearly all examples drawn from a random distribution of Markov games.
Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games
Weintraub, Gabriel Y., Benkard, Lanier, Roy, Benjamin Van
We propose a mean-field approximation that dramatically reduces the computational complexity of solving stochastic dynamic games. We provide conditions that guarantee our method approximates an equilibrium as the number of agents grow. We then derive a performance bound to assess how well the approximation performs for any given number of agents. We apply our method to an important class of problems in applied microeconomics. We show with numerical experiments that we are able to greatly expand the set of economic problems that can be analyzed computationally.
Bayesian models of human action understanding
Baker, Chris, Saxe, Rebecca, Tenenbaum, Joshua B.
We present a Bayesian framework for explaining how people reason about and predict the actions of an intentional agent, based on observing its behavior. Action-understanding is cast as a problem of inverting a probabilistic generative model, which assumes that agents tend to act rationally in order to achieve their goals given the constraints of their environment. Working in a simple sprite-world domain, we show how this model can be used to infer the goal of an agent and predict how the agent will act in novel situations or when environmental constraints change. The model provides a qualitative account of several kinds of inferences that preverbal infants have been shown to perform, and also fits quantitative predictions that adult observers make in a new experiment.
On Local Rewards and Scaling Distributed Reinforcement Learning
We consider the scaling of the number of examples necessary to achieve good performance in distributed, cooperative, multi-agent reinforcement learning, as a function of the the number of agents n. We prove a worstcase lower bound showing that algorithms that rely solely on a global reward signal to learn policies confront a fundamental limit: They require a number of real-world examples that scales roughly linearly in the number of agents. For settings of interest with a very large number of agents, this is impractical. We demonstrate, however, that there is a class of algorithms that, by taking advantage of local reward signals in large distributed Markov Decision Processes, are able to ensure good performance with a number of samples that scales as O(log n). This makes them applicable even in settings with a very large number of agents n.
Policy-Gradient Methods for Planning
Probabilistic temporal planning attempts to find good policies for acting in domains with concurrent durative tasks, multiple uncertain outcomes, and limited resources. These domains are typically modelled as Markov decision problems and solved using dynamic programming methods. This paper demonstrates the application of reinforcement learning -- in the form of a policy-gradient method -- to these domains. Our emphasis is large domains that are infeasible for dynamic programming. Our approach is to construct simple policies, or agents, for each planning task. The result is a general probabilistic temporal planner, named the Factored Policy-Gradient Planner (FPG-Planner), which can handle hundreds of tasks, optimising for probability of success, duration, and resource use.
Cyclic Equilibria in Markov Games
Zinkevich, Martin, Greenwald, Amy, Littman, Michael L.
Although variants of value iteration have been proposed for finding Nash or correlated equilibria in general-sum Markov games, these variants have not been shown to be effective in general. In this paper, we demonstrate byconstruction that existing variants of value iteration cannot find stationary equilibrium policies in arbitrary general-sum Markov games. Instead, we propose an alternative interpretation of the output of value iteration basedon a new (non-stationary) equilibrium concept that we call "cyclic equilibria." We prove that value iteration identifies cyclic equilibria ina class of games in which it fails to find stationary equilibria. We also demonstrate empirically that value iteration finds cyclic equilibria in nearly all examples drawn from a random distribution of Markov games.
Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games
Weintraub, Gabriel Y., Benkard, Lanier, Roy, Benjamin Van
We propose a mean-field approximation that dramatically reduces the computational complexity of solving stochastic dynamic games. We provide conditionsthat guarantee our method approximates an equilibrium as the number of agents grow. We then derive a performance bound to assess how well the approximation performs for any given number of agents. We apply our method to an important class of problems in applied microeconomics.We show with numerical experiments that we are able to greatly expand the set of economic problems that can be analyzed computationally.
Bayesian models of human action understanding
Baker, Chris, Saxe, Rebecca, Tenenbaum, Joshua B.
We present a Bayesian framework for explaining how people reason about and predict the actions of an intentional agent, based on observing itsbehavior. Action-understanding is cast as a problem of inverting a probabilistic generative model, which assumes that agents tend to act rationally in order to achieve their goals given the constraints of their environment. Workingin a simple sprite-world domain, we show how this model can be used to infer the goal of an agent and predict how the agent will act in novel situations or when environmental constraints change. The model provides a qualitative account of several kinds of inferences that preverbal infants have been shown to perform, and also fits quantitative predictionsthat adult observers make in a new experiment.