VDCBPI: an Approximate Scalable Algorithm for Large POMDPs
–Neural Information Processing Systems
Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that mitigates both sources of intractability by combining the Value Directed Compression (VDC) technique [13] with Bounded Pol- icy Iteration (BPI) [14]. The scalability of VDCBPI is demonstrated on synthetic network management problems with up to 33 million states. Partially observable Markov decision processes (POMDPs) provide a natural and expres- sive framework for decision making, but their use in practice has been limited by the lack of scalable solution algorithms. Two important sources of intractability plague discrete model-based POMDPs: high dimensionality of belief space, and the complexity of policy or value function (VF) space.
Neural Information Processing Systems
Apr-6-2023, 15:44:06 GMT