Bayesian learning of the optimal action-value function in a Markov decision process

Open in new window