Reviews: Bayesian Control of Large MDPs with Unknown Dynamics in Data-Poor Environments
–Neural Information Processing Systems
POST-REBUTTAL: Thank you for the clarifications. I've increased the overall score because the rebuttal made me think of this work as an interesting tradeoff between generality and the time when when computational cost investment has to be made (online vs. offline). At the same time, BAMCP has the significant advantage of handling more stochastic (as opposed to nearly deterministic, as in this paper) scenarios, and I would encourage you to work on extending your approach to them. The paper proposes a Bayesian RL method for MDPs with continuous states, action, and parameter spaces. Its key component is efficiently approximating the Q-value function expectations over the posterior belief over the parameter space, which is achieved by a combination of GPTD during the offline stage of the algorithm and MCMC during the online stage.
Neural Information Processing Systems
Oct-7-2024, 07:23:35 GMT
- Technology: