Reinforcement Learning Using Approximate Belief States

Neural Information Processing Systems 

The problem of developing good policies for partially observable Markov decision problems (POMDPs) remains one of the most challenging ar(cid:173) eas of research in stochastic planning. One line of research in this area involves the use of reinforcement learning with belief states, probabil(cid:173) ity distributions over the underlying model states. This is a promis(cid:173) ing method for small problems, but its application is limited by the in(cid:173) tractability of computing or representing a full belief state for large prob(cid:173) lems. Recent work shows that, in many settings, we can maintain an approximate belief state, which is fairly close to the true belief state. In particular, great success has been shown with approximate belief states that marginalize out correlations between state variables.