Reviews: Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Jan-27-2025, 20:00:53 GMT–Neural Information Processing Systems

The results are new and important to the field, and the analysis in this setting seems nontrivial. In addition, the paper also develops a new variant of TDC under a blockwise diminishing stepsize, and proves it asymptotically convergent with an arbitrarily small training error at linear convergence rate. Extensive experiments demonstrate that the new TDC variant can converge as fast as vanilla TDC with constant stepsize, and at the same time it enjoys comparable accuracy as TDC with diminishing stepsize. Overall, the paper has both analytical as well as practical value. However, the following issues need to be addressed. Markovian sample path has been studied in e.g., [30,34].

non-asymptotic analysis, stepsize, time-scale off-policy td learning, (8 more...)

Neural Information Processing Systems

Jan-27-2025, 20:00:53 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)