technique for analyzing SA using the smoothed Lyapunov function is applicable for developing bounds for RL that
–Neural Information Processing Systems
R1: Title is too general: We will make the corresponding changes on the title. V -trace algorithm, they are from the original paper [17], and we do not make any additional assumptions. Our joint analysis of both is the key to our recursion (Proposition 2.1). Q-learning, and V -trace etc. can all be modeled by SA under contraction operator and martingale difference noise [5]. Thus our result is a broad tool to establish the finite-sample error bound of various RL algorithms.
Neural Information Processing Systems
Nov-14-2025, 02:33:52 GMT
- Technology: