Appendices ASketchofTheoreticalAnalyses
–Neural Information Processing Systems
Theorem B.1 (Performance difference bound for Model-based RL). Mi denote the inconsistency between the learned dynamics PMi and the true dynamics, i.e. ϵ For L1 L3, with the performance gap approximation of M1 and π1, we apply Lemma C.2, and Here, dπMi denotes the distribution of state-action pair induced by policy π under the dynamical modelMi. Theorem B.3 (Refined bound with constraints). Let µ and v be two probability distributions on the configuration space X, according to LemmaC.1,thenwehaveDTV(µ Under these definitions, we can yield the following intermediate outcome by applying the results from B.2and B.1 Here, we take the time-varying linear quadratic regulator as an instance for illustrating the rationality of our assumption on α.
Neural Information Processing Systems
Feb-10-2026, 19:51:29 GMT