(More) Efficient Reinforcement Learning via Posterior Sampling

Osband, Ian, Russo, Daniel, Roy, Benjamin Van

Feb-14-2020, 19:13:16 GMT–Neural Information Processing Systems

Most provably efficient learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficient exploration, posterior sampling for reinforcement learning (PSRL). This algorithm proceeds in repeated episodes of known duration. At the start of each episode, PSRL updates a prior distribution over Markov decision processes and takes one sample from this posterior. PSRL then follows the policy that is optimal for this sample during the episode.

algorithm, efficient reinforcement learning, posterior sampling, (1 more...)

Neural Information Processing Systems

Feb-14-2020, 19:13:16 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)