Stick-Breaking Policy Learning in Dec-POMDPs

Liu, Miao, Amato, Christopher, Liao, Xuejun, Carin, Lawrence, How, Jonathan P.

Nov-23-2015–arXiv.org Artificial Intelligence

Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from optimal. This paper considers a variable-size FSC to represent the local policy of each agent. These variable-size FSCs are constructed using a stick-breaking prior, leading to a new framework called \emph{decentralized stick-breaking policy representation} (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the Dec-POMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.

artificial intelligence, dec-sbpr, upstream oil & gas, (16 more...)

arXiv.org Artificial Intelligence

Nov-23-2015

arXiv.org PDF

Add feedback

Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre:
- Research Report (1.00)

Industry:
- Energy > Oil & Gas
  - Upstream (0.47)
- Government (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Learning Graphical Models
    - Undirected Networks > Markov Models (1.00)
  - Representation & Reasoning > Agents (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found