Reviews: Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning
–Neural Information Processing Systems
Here are few suggestions for expanding your presentation: 1. The literature review needs to be broadened. In particular, you should discuss the work of Liu et al., JAIR 2019 (Proximal Gradient TD Learning), which analyzes two time-scale algorithms that include a proximal gradient step. The results in that paper show improved finite sample bounds over classic gradient TD methods, like GTD2. How do your results compare with those in that paper, and in particular, can your analysis be extended to GTD2-MP (the mirror-prox variant of GTD2, which has an improved finite sample convergence rate compared to GTD2. 2. Two time scale algorithms are somewhat more complex than the standard TD method, and Sutton et al. and others have developed a variant of TD called emphatic TD (JMLR 2016: "An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning") that is stable under off-policy training.
Neural Information Processing Systems
Jan-27-2025, 16:55:05 GMT
- Technology: