Supplementary Material for: An Exponential Lower Bound for Linearly-Realizable MDPs with Constant Suboptimality Gap

Apr-25-2026, 20:32:50 GMT–Neural Information Processing Systems

We first verify the statement for the terminal state f. Observe that at the terminal state f, regardless of the action taken, the next state is always f and the reward is always 0. Hence Q h(f,) = V h(f) = 0 for all h [H]. Thus Q h(f,) = hφ(f,),v(a)i= 0. We now verify realizability for other states via induction on h = H,H 1,,1. Next, note that h, (2) follows from (1). In other words, (1) implies that a is always the optimal action.

artificial intelligence, machine learning, probability, (17 more...)

Neural Information Processing Systems

Apr-25-2026, 20:32:50 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
4f00921114932db3f8662a41b44ee68f-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found