forthisreason
Country:
- North America > United States > Washington > King County > Seattle (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Russia (0.04)
- Asia > Russia (0.04)
cf708fc1decf0337aded484f8f4519ae-Supplemental.pdf
We found that training an inverse model is crucial for learning good representations. On the first row,alevel from each environment that one-shot PPGS fails tosolve(thewhitearrowsrepresent thepolicy). Iterative Model Improvement In general settings, collecting training trajectories by sampling actions uniformly atrandom does not grant sufficient coverage ofthe state space. GLAMORGLAMOR [34] learns inverse dynamics to achieve visual goals in Atari games. The only difference withPPGS in terms of settings is that we allowGLAMORto collect data on-policy and for more interactions (2M).