A Additional related works

Neural Information Processing Systems 

The reader is also referred to the summaries of recent literature in Du et al. (2020a, 2021). Y ang and Wang (2019); Jin et al. (2020) proposed the linear MDP model, which can Du et al. (2020a) considered the policy completeness assumption, which assumes that the Q-functions of all policies reside within a function class that contains Du et al. (2020a); Lattimore et al. (2020) examined how the model misspecification error propagates Du et al. (2020b) showed that sample-efficient RL is feasible in deterministic systems, which has been extended to stochastic systems with low variance in Du et al. (2019) under additional gap assumptions. In addition, Weisz et al. (2021b) established exponential sample complexity lower Weisz et al. (2021a) provided a sample-efficient algorithm when only Recently, Du et al. (2021) introduced the bilinear Before proceeding, we introduce several convenient notation to be used throughout the proof. Lemma 1. F or all 1 k K and 1 h H, one has I It thus comes down to bounding the right-hand side of (27). Lemma 2. Suppose that c (see Lemma 1).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found