Bayes-Adaptive POMDPs

Neural Information Processing Systems 

Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elegant solution to the exploration-exploitation trade-off in reinforce- ment learning. Our goal is to extend these ideas to the more general Partially Observable MDP (POMDP) framework, where the state is a hidden variable. To address this problem, we in- troduce a new mathematical model, the Bayes-Adaptive POMDP. This new model allows us to (1) improve knowledge of the POMDP domain through interaction with the environment, and (2) plan optimal sequences of actions which can trade- off between improving the model, identifying the state, and gathering reward. We show how the model can be finitely approximated while preserving the value func- tion.