Model-based Reinforcement Learning and the Eluder Dimension
–Neural Information Processing Systems
We consider the problem of learning to optimize an unknown Markov decision process (MDP). We show that, if the MDP can be parameterized within some known function class, we can obtain regret bounds that scale with the dimensionality, rather than cardinality, of the system.
Neural Information Processing Systems
Feb-8-2025, 18:06:04 GMT