Monte Carlo POMDPs
–Neural Information Processing Systems
We present a Monte Carlo algorithm for learning to act in partially observable Markov decision processes (POMDPs) with real-valued state and action spaces. Our approach uses importance sampling for representing beliefs, and Monte Carlo approximation for belief propagation. A reinforcement learning algorithm, value iteration, is employed to learn value functions over belief states. Finally, a samplebased versionof nearest neighbor is used to generalize across states. Initial empirical results suggest that our approach works well in practical applications.
Neural Information Processing Systems
Dec-31-2000