Efficient Planning in Large MDPs with Weak Linear Function Approximation

Roshan Shariff & Csaba Szepesvári

Neural Information Processing Systems 

For the convenience of the reader, we have collected the most frequently used symbols and their meaningsinthefollowingtable: R,R+ realnumbers;non-negativerealnumbers. Now observe that the constraint[) [{ is equivalent to the constraint] K) ] K{ in the definition of LRA -- both of them require that>B) { B for B S . For any ˆ, R<+ and distribution over states - ΔS, suppose) R3 is feasible for (LRALP-) and at most B-suboptimal. Substituting this into the last inequality and rearranging gives the desired result. Thentheresultsofrunning Algorithm1for) iterations satisfy Yopt E[X(ˆ,, ˆ))] 14 3) .

Similar Docs  Excel Report  more

TitleSimilaritySource
None found