We thank the reviewers for their careful reading and constructive comments, and have already updated the manuscript

Neural Information Processing Systems 

We thank the reviewers for their careful reading and constructive comments, and have already updated the manuscript. Learning (Meta-IL) we implemented a meta variant of DAgger and tuned it carefully. As shown in Table 1 and Figure 1, SMILe's performance is As such, in the Ant 2D Goal task we trained "fully observed" policies (using SAC) which observe Figure above, fully observable policies were not able to solve this sparse reward task. Generalizing to the 25 testing dynamics is highly non-trivial. Meta-DAgger performs substantially worse than SMILe on this task.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found