Efficient Planning in Large MDPs with Weak Linear Function Approximation
Roshan Shariff & Csaba Szepesvári
–Neural Information Processing Systems
For the convenience of the reader, we have collected the most frequently used symbols and their meaningsinthefollowingtable: R,R+ realnumbers;non-negativerealnumbers. Now observe that the constraint[) [{ is equivalent to the constraint] K) ] K{ in the definition of LRA -- both of them require that>B) { B for B S . For any ˆ, R<+ and distribution over states - ΔS, suppose) R3 is feasible for (LRALP-) and at most B-suboptimal. Substituting this into the last inequality and rearranging gives the desired result. Thentheresultsofrunning Algorithm1for) iterations satisfy Yopt E[X(ˆ,, ˆ))] 14 3) .
Neural Information Processing Systems
Feb-10-2026, 18:42:39 GMT