Optimal Posterior Sampling for Policy Identification in Tabular Markov Decision Processes

Open in new window