paper

Feb-7-2025, 15:50:20 GMT–Neural Information Processing Systems

In this appendix we provide proofs for the structural results in the paper. We first provide the proof of Lemma 1. In Appendix A.2 we focus on results relating to realizability. Then in Appendix A.3 we turn to the separation results of Proposition 1 and Proposition 2. Finally in Appendix A.4 we provide details about the connection to the Bellman and Witness rank. The result shows that if the low rank MDP has M states, then we require n = (M) samples to find a near-optimal policy with moderate probability.

artificial intelligence, machine learning, probability, (17 more...)

Neural Information Processing Systems

Feb-7-2025, 15:50:20 GMT

Conferences PDF

Add feedback