Variance-Reduced Off-Policy TDC Learning: Non-Asymptotic Convergence Analysis

Neural Information Processing Systems 

Recently, several work proposed to apply the variance reduction technique developed in the stochastic optimization literature to reduce the variance of TD learning.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found