VDCBPI: an Approximate Scalable Algorithm for Large POMDPs

Dec-31-2005–Neural Information Processing Systems

Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that mitigates both sources of intractability by combining the V alue Directed Compression (VDC) technique [13] with Bounded Policy Iteration (BPI) [14]. The scalability of VDCBPI is demonstrated on synthetic network management problems with up to 33 million states. 1 Introduction Partially observable Markov decision processes (POMDPs) provide a natural and expressive framework for decision making, but their use in practice has been limited by the lack of scalable solution algorithms. T wo important sources of intractability plague discrete model-based POMDPs: high dimensionality of belief space, and the complexity of policy or value function (VF) space. Classic solution algorithms [4, 10, 7], for example, compute value functions represented by exponentially many value vectors, each of exponential size. As a result, they can only solve POMDPs with on the order of 100 states.

artificial intelligence, machine learning, pomdp, (17 more...)

Neural Information Processing Systems

Dec-31-2005

Conferences PDF

Add feedback

Country:
- North America > United States > Wisconsin (0.14)

Industry:
- Health & Medicine > Therapeutic Area (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Duplicate Docs Excel Report

Title
VDCBPI: an Approximate Scalable Algorithm for Large POMDPs
VDCBPI: an Approximate Scalable Algorithm for Large POMDPs

Similar Docs Excel Report more

Title	Similarity	Source
None found