Goto

Collaborating Authors

 Reinforcement Learning





The V alue-Equivalence Principle for Model-Based Reinforcement Learning Supplementary Material

Neural Information Processing Systems

In this supplement we give details of our theoretical results and experiments that had to be left out of the main paper due to space constraints. Section A.1.1 contains derivations of the properties and propositions presented in the main Section A.2 provides a detailed outline of the pipeline used across our experiments in the The numbering of equations, figures and citations resume from what is used in the main paper. This result directly follows from Definitions 1 and 2.Property 2. M( null, V) either contains m We will show the result by contradiction. In order to prove Proposition 2 we will need four lemmas which we state and prove below. It follows that H - dim[B ] = nm rank(A) rank(C).


The Value Equivalence Principle for Model-Based Reinforcement Learning

Neural Information Processing Systems

Learning models of the environment from data is often viewed as an essential component to building intelligent reinforcement learning (RL) agents. The common practice is to separate the learning of the model from its use, by constructing a model of the environment's dynamics that correctly predicts the observed state transitions. In this paper we argue that the limited representational resources of model-based RL agents are better used to build models that are directly useful for value-based planning.






Budgeted Reinforcement Learning in Continuous State Space

Neural Information Processing Systems

So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs.