Thompson Sampling for Learning Parameterized Markov Decision Processes

Open in new window