Appendix for On Effective Scheduling of Model based Reinforcement Learning

Apr-25-2026, 00:31:26 GMT–Neural Information Processing Systems

We call c(m) the m-step concentrability of a future-state distribution and call Cρ,µ the discountedaverage concentrability coefficient of the future-state distributions. The class of MDPs that satisfies this concentrability assumption is quite large, which is further discussed in Munos and Szepesvári [18]. If Xi, i = 1,...,N is an i.i.d. And when q = 1, N is used instead of N1. From the definition, one can esasily see that Nq,FX1:N N. Lemma A.2. (Single Iteration Error Bound) Let Vk and Vk+1 be the value functions of iteration kand k+1, and Vmax = rmax/(1 γ).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Apr-25-2026, 00:31:26 GMT

Conferences PDF

Add feedback

Genre:
- Research Report > Experimental Study (0.47)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Duplicate Docs Excel Report

Title
1e4d36177d71bbb3558e43af9577d70e-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found