Bayes-Adaptive POMDPs
–Neural Information Processing Systems
Bayesian Reinforcement Learning has generated substantial interest recently, as it provides an elegant solution to the exploration-exploitation trade-off in reinforce- ment learning. Our goal is to extend these ideas to the more general Partially Observable MDP (POMDP) framework, where the state is a hidden variable. To address this problem, we in- troduce a new mathematical model, the Bayes-Adaptive POMDP. This new model allows us to (1) improve knowledge of the POMDP domain through interaction with the environment, and (2) plan optimal sequences of actions which can trade- off between improving the model, identifying the state, and gathering reward. We show how the model can be finitely approximated while preserving the value func- tion.
Neural Information Processing Systems
Apr-6-2023, 14:38:05 GMT
- Technology: