cf9a242b70f45317ffd281241fa66502-AuthorFeedback.pdf

Neural Information Processing Systems 

We thank the reviewers for their close reading of the paper and helpful feedback. Forexample, one can use thedensity ratio estimates7 provided by DualDICE to modify (importance-weight) the off-policy data distribution before passing it to a policy8 gradient orQ-learning method. The figures are overall too small... In Figure 2 the x axis label is missing. The x-axis is training step.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found