Supplement to " Sample-Efficient Reinforcement Learning for Linearly-Parameterized MDPs with a Generative Model " Bingyan Wang

Neural Information Processing Systems 

In this section we gather the notations that will be used throughout the appendix. Next, we reconsider Assumption 1. In this section, we will provide complete proof for Theorem 1. See Appendix B.2. 3 Applying (21) to (19) reveals that null null nullQ This justifies the first inequality (33). In this section, we will provide complete proof for Theorem 2. We actually prove a more general Therefore, we are left to justify (40). Here we adopt the same notations as [4].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found