Supplementary Material for: An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap 1 Proof of Lemma 2 Proof
–Neural Information Processing Systems
This is exactly (1) for h. Hence both (1) and (2) hold for all h [H ]. We state a proof of this lemma for completeness. We now check realizability in the new MDP . What remains is show the statements for all h via induction.
Neural Information Processing Systems
Nov-14-2025, 02:58:03 GMT
- Technology: