Supplementto" Sample-EfficientReinforcement LearningforLinearly-ParameterizedMDPs withaGenerativeModel "
–Neural Information Processing Systems
In addition, we define1 to be a vector with all the entries being 1, andI be the identity matrix. Suppose thatδ > 0andε (0,(1 γ) 1/2]. The remainder of this section is devotedtoprovingTheorem3. VT) to be the policy (resp. The remainder of this section is devotedtoprovingTheorem4.
Neural Information Processing Systems
Feb-11-2026, 00:46:26 GMT