Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

Dec-24-2025, 10:18:23 GMT–Neural Information Processing Systems

Variance reduction techniques have been successfully applied to temporal-difference (TD) learning and help to improve the sample complexity in policy evaluation. However, the existing work applied variance reduction to either the less popular one time-scale TD algorithm or the two time-scale GTD algorithm but with a finite number of i.i.d.\ samples, and both algorithms apply to only the on-policy setting. In this work, we develop a variance reduction scheme for the two time-scale TDC algorithm in the off-policy setting and analyze its non-asymptotic convergence rate over both i.i.d.\ and Markovian samples.

name change, non-asymptotic convergence analysis, variance-reduced off-policy tdc learning, (5 more...)

Neural Information Processing Systems

Dec-24-2025, 10:18:23 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)