Goto

Collaborating Authors

 online trial


5ec4e93f2cec19d47ef852a0e1fb2c48-Supplemental-Conference.pdf

Neural Information Processing Systems

A.1 AdditionalMethodJustification The key idea ofQWALE is to lead the agent to nearby states within distribution of the prior data if it is out of distribution and to nearby states closer to task completion if in distribution. This problem has been studied instochastic optimal control, particularly REPS [Peters etal., 2010]. Weusethisupdatefor all our evaluated methods online in order to improve stability. For all experiments using prior data collected through RL, the agent was initialized at test time with the pretrained policyand critic. The details for this environment are in [Sharma et al., 2021b].


A Appendix A.1 Additional Method Justification The key idea of Q

Neural Information Processing Systems

This problem has been studied in stochastic optimal control, particularly REPS [Peters et al., 2010]. In our experiments, we use soft actor-critic [Haarnoja et al., 2018] as our base RL algorithm. The policy and critic networks are MLPs with 2 fully-connected hidden layers of size 256. Following [Sharma et al., 2021b], we use a biased TD update, where For all experiments using prior data collected through RL, the agent was initialized at test time with the pretrained policy and critic. The details for this environment are in [Sharma et al., 2021b].