be3e9d3f7d70537357c67bb3f4086846-Supplemental.pdf

Neural Information Processing Systems 

Amaximum of20K generations is specified in the training, but stopped early if the performance converged. We consider two possible approaches when we take sample-efficiency into consideration. A.4.2 PyBulletAnt In the PyBullet Ant experiment, we demonstrated that a pre-trained policy can be converted into a permutation invariant one with behavior cloning (BC). We give detailed task description and experimental setups here. Thesecond, larger policy is similar in architecture, but we added one more FC layer and expanded all hidden size to128to increase its expressiveness.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found