Approximate Planning in Large POMDPs via Reusable Trajectories
Kearns, Michael J., Mansour, Yishay, Ng, Andrew Y.
–Neural Information Processing Systems
We consider the problem of reliably choosing a near-best strategy from a restricted class of strategies TI in a partially observable Markov decision process(POMDP). We assume we are given the ability to simulate the POMDP, and study what might be called the sample complexity - that is, the amount of data one must generate in the POMDP in order to choose a good strategy. We prove upper bounds on the sample complexity showingthat, even for infinitely large and arbitrarily complex POMDPs, the amount of data needed can be finite, and depends only linearly on the complexity of the restricted strategy class TI, and exponentially onthe horizon time. This latter dependence can be eased in a variety of ways, including the application of gradient and local search algorithms.
Neural Information Processing Systems
Dec-31-2000