Goto

Collaborating Authors

 atarigame


Appendix

Neural Information Processing Systems

Now we explain the motivation behind these feature design. From these three policies, we tried to extract all possible information. The information should be cheap to extract and dependent on the current data, so we prefer features extracted from the outputs of these policies (value, entropy, distance,return,etc.). Vθ is the value network,alsoparameterizedwithθ. We train all models with Adam optimizer.