paper

Akshay Krishnamurthy

Neural Information Processing Systems 

In this appendix we provide proofs for the structural results in the paper. We first provide the proof of Lemma 1. In Appendix A.2 we focus on results relating to realizability. Then in Appendix A.3 we turn to the separation results of Proposition 1 and Proposition 2. Finally in Appendix A.4 we provide details about the connection to the Bellman and Witness rank. The result shows that if the low rank MDP has M states, then we require n = (M) samples to find a near-optimal policy with moderate probability.