Optimal and Approximate Q-value Functions for Decentralized POMDPs

Oliehoek, F. A., Spaan, M. T. J., Vlassis, N.

May-28-2008–Journal of Artificial Intelligence Research

Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem.

agent, dec-pomdp, q-value function, (12 more...)

Journal of Artificial Intelligence Research

May-28-2008

Journals PDF

Add feedback

Country:
- North America > United States (0.27)
- Europe
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Greece > Crete
    - Chania (0.04)

Genre:
- Overview (0.45)

Industry:
- Law Enforcement & Public Safety > Fire & Emergency Services (0.48)
- Leisure & Entertainment
  - Games (0.67)
  - Sports (0.45)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)