We sincerely thank the reviewers for their helpful comments

Neural Information Processing Systems 

We sincerely thank the reviewers for their helpful comments. The baselines do not solve BiMGame & AntMaze even with optimal trajectories. Fig. D, E shows this as We see similar trends for AggreV aTeD. Although they stagnate after making some progress, their cumulative terminal-only reward is 0. (see Line 300-302). We only assume ordering of state groups, which is implicit in many tasks.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found