Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

Open in new window