cc4d9cfc45325e460b455a820d5f212c-Supplemental-Conference.pdf

Neural Information Processing Systems 

The observations are based onproprioception andonegocentricvision. The observations are based onproprioception, sword position and orientation, holeposition. Notetheseenvironmentsare related tothe domains that havebeen proposed for use inoffline RL benchmarks [Gulcehre etal., 2020]; however,the experiments we perform inthis work require availability ofthe expert policy, so we do not use offline data, but instead train new experts and perform experiments in the very low data regime. The main method we consider isAPC described inSection 3from main paper for offline experts cloning experiments. We can also consider resampling a new action, but we empirically found that cross-entropy worked better, see Appendix(8.6).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found