Supplementary material A Detailed description of baselines A.1 Continuous Baselines

Neural Information Processing Systems 

Multivariate Gaussian distributions, which is used as the final policy output. Humanoid experiments, the data consists of very diverse way of running). In Table 5, we show the hyperparameters shared among our baselines. Distributed Distributional Deep Deterministic Policy Gradient [ Barth-Maron et al., 2018 ] We used batch size 1024 for the experiments. Behavior Regularized Actor Critic [ Wu et al., 2019 ] is an actor critic algorithm where the We use the exact same network architecture as described in the original paper.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found