Appendix

Neural Information Processing Systems 

A Method Details A.1 The attention network The attention network is implemented as a feedforward neural network with one hidden layer: Input layer: 12 units Hidden layer: N units coupled with a dropout layer p = 0 . From these three policies, we tried to extract all possible information. The information should be cheap to extract and dependent on the current data, so we prefer features extracted from the outputs of these policies (value, entropy, distance, return, etc.). Intuitively, the most important features should be the empirical returns, values associated with each policy and the distances, which gives a good hint of which virtual policy will yield high performance (e.g., a virtual policy that is closer to the policy that obtained high return and low value loss). A.2 The advantage function In this paper, we use GAE [18] as the advantage function for all models and experiments ˆ A Note that Algo. 1 illustrates the procedure for 1 actor. A.3 The objective function Following [19], our objective function also includes value loss and entropy terms.

Duplicate Docs Excel Report

Similar Docs  Excel Report  more

TitleSimilaritySource
None found