acrobot
cf9a242b70f45317ffd281241fa66502-AuthorFeedback.pdf
We thank the reviewers for their close reading of the paper and helpful feedback. Forexample, one can use thedensity ratio estimates7 provided by DualDICE to modify (importance-weight) the off-policy data distribution before passing it to a policy8 gradient orQ-learning method. The figures are overall too small... In Figure 2 the x axis label is missing. The x-axis is training step.
Country:
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.40)
Country:
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.40)