Adaptive Sampling for Best Policy Identification in Markov Decision Processes

Open in new window