Supplementto" Sample-EfficientReinforcement LearningforLinearly-ParameterizedMDPs withaGenerativeModel "

Neural Information Processing Systems 

In addition, we define1 to be a vector with all the entries being 1, andI be the identity matrix. Suppose thatδ > 0andε (0,(1 γ) 1/2]. The remainder of this section is devotedtoprovingTheorem3. VT) to be the policy (resp. The remainder of this section is devotedtoprovingTheorem4.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found