A Upper Bound with Gap dependent Analysis
–Neural Information Processing Systems
We begin with the proof of thresholding technique. A.1 Definitions We first restate the notations. Definition A.2 (Pessimistically estimated MDP) . F or a given successful pessimistic algorithm execution instance, where the arguments in Definition A.1 are simultaneously satisfied, we call In the following proof of Corollary A.1, we will set Rigorous proof is deferred to Appendix A.4 With Theorem A.1, we just need to prove that The following lemmas will be frequently used throughout the proof of Theorem A.1 and upper Algorithm used here is Lower Confidence Bound V alue Iteration(VI-LCB)[Xie et al., 2021b] with The basic idea of LCB is to pessimistically estimate the Q function so that the algorithm won't over estimate some hardly seen suboptimal actions in The subsampling trick introduced by Li et al. [2022] helps solve the independence problem Superscripts stand for the dataset. See Li et al. [2022] for a more detailed description of the algorithm.
Neural Information Processing Systems
Aug-15-2025, 05:35:51 GMT
- Technology: