The V alue-Equivalence Principle for Model-Based Reinforcement Learning Supplementary Material

Neural Information Processing Systems 

In this supplement we give details of our theoretical results and experiments that had to be left out of the main paper due to space constraints. Section A.1.1 contains derivations of the properties and propositions presented in the main Section A.2 provides a detailed outline of the pipeline used across our experiments in the The numbering of equations, figures and citations resume from what is used in the main paper. This result directly follows from Definitions 1 and 2.Property 2. M( null, V) either contains m We will show the result by contradiction. In order to prove Proposition 2 we will need four lemmas which we state and prove below. It follows that H - dim[B ] = nm rank(A) rank(C).