5ec4e93f2cec19d47ef852a0e1fb2c48-Supplemental-Conference.pdf
–Neural Information Processing Systems
A.1 AdditionalMethodJustification The key idea ofQWALE is to lead the agent to nearby states within distribution of the prior data if it is out of distribution and to nearby states closer to task completion if in distribution. This problem has been studied instochastic optimal control, particularly REPS [Peters etal., 2010]. Weusethisupdatefor all our evaluated methods online in order to improve stability. For all experiments using prior data collected through RL, the agent was initialized at test time with the pretrained policyand critic. The details for this environment are in [Sharma et al., 2021b].
Neural Information Processing Systems
Feb-9-2026, 08:15:37 GMT
- Technology: