A Theoretical Derivations

Neural Information Processing Systems 

An brief proof is provided as follows. Here, we describe certain implementation details of TEEN. For recurrent optimization mentioned in section 4.2, we set the period of We provide explicit parameters used in our algorithm in Table 1. For reproduction of TD3, we use the official implementation ( https://github.com/sfujim/TD3). Batch size 256 Discount ( γ) 0.99 Number of hidden layers 2 Number of hidden units per layer 256 Activation function ReLU Iterations per time step 1 Target smoothing coefficient ( η) 5 10 V ariance of target policy smoothing 0.2 Noise clip range [ 0.5, 0.5] Target critic update interval 2 16 C Additional Experimental Results The bolded line represents the average evaluation over 5 seeds.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found