4496bf24afe7fab6f046bf4923da8de6-AuthorFeedback.pdf

Neural Information Processing Systems 

Thisisespeciallytrue3 because practical deployments of RL are bottle-necked by its poor sample efficiency. Wedidn'tknowabout D4RL when writing thepaper (it17 is a recent preprint), but we ran the experiment on maze2d-umaze now (Fig. a). Our model significantly outperforms the baselines and the20 ablations. Our experiment on D4RL also shows clear improvement overbaselines and ablations (Fig a.).46 On WalkerParam, we agree with your analysis and will clarify in the paper that the performance improvement in47 WalkerParam comes from distillation.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found