cc4d9cfc45325e460b455a820d5f212c-Supplemental-Conference.pdf
–Neural Information Processing Systems
The observations are based onproprioception andonegocentricvision. The observations are based onproprioception, sword position and orientation, holeposition. Notetheseenvironmentsare related tothe domains that havebeen proposed for use inoffline RL benchmarks [Gulcehre etal., 2020]; however,the experiments we perform inthis work require availability ofthe expert policy, so we do not use offline data, but instead train new experts and perform experiments in the very low data regime. The main method we consider isAPC described inSection 3from main paper for offline experts cloning experiments. We can also consider resampling a new action, but we empirically found that cross-entropy worked better, see Appendix(8.6).
Neural Information Processing Systems
Feb-11-2026, 22:58:18 GMT
- Technology: