Model-based Reinforcement Learning and the Eluder Dimension

Feb-14-2020, 07:56:47 GMT–Neural Information Processing Systems

We consider the problem of learning to optimize an unknown Markov decision process (MDP). We show that, if the MDP can be parameterized within some known function class, we can obtain regret bounds that scale with the dimensionality, rather than cardinality, of the system. These represent the first unified regret bounds for model-based reinforcement learning and provide state of the art guarantees in several important settings. Moreover, we present a simple and computationally efficient algorithm \emph{posterior sampling for reinforcement learning} (PSRL) that satisfies these bounds. Papers published at the Neural Information Processing Systems Conference.

eluder dimension, emph, model-based reinforcement learning

Neural Information Processing Systems

Feb-14-2020, 07:56:47 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)