Supplement to " Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model " Bingyan Wang
–Neural Information Processing Systems
In this section we gather the notations that will be used throughout the appendix. Next, we reconsider Assumption 1. In this section, we will provide complete proof for Theorem 1. See Appendix B.2. 3 Applying (21) to (19) reveals that null null nullQ This justifies the first inequality (33). In this section, we will provide complete proof for Theorem 2. We actually prove a more general Therefore, we are left to justify (40). Here we adopt the same notations as [4].
Neural Information Processing Systems
Feb-10-2025, 14:25:25 GMT