Supplementary Material for: An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap 1 Proof of Lemma 2 Proof

Neural Information Processing Systems 

This is exactly (1) for h. Hence both (1) and (2) hold for all h [H ]. We state a proof of this lemma for completeness. We now check realizability in the new MDP . What remains is show the statements for all h via induction.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found