Importance Resamplingfor Off-policy Prediction

Neural Information Processing Systems 

Thoughunbiased, IScanbehigh-variance. Alowervariancealternativeis Weighted IS (WIS). Figure 4: Learning Ratesensitivityplotsinthe Random Walk Markov Chain, withbuffersizen = 15000 andmini-batchsizek = 16.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found