A Broader Impact
–Neural Information Processing Systems
For our specific algorithm, TD3+BC, given the performance gain over existing state-of-the-art methods is minimal, it would be surprising to see our paper result in significant impact in these contexts. We use the following software versions: Python 3.6 Pytorch 1.4.0 [Paszke et al., 2019] Tensorflow 2.4.0 [Abadi et al., 2016] Gym 0.17.0 [Brockman et al., 2016] MuJoCo 1.50 Hopper-medium-v0 and Hopper-expert-v0 on a single seed which was unused in final reported results. We use default hyperparameters according to each GitHub whenever possible. Mini-batch size 256 Discount factor 0.99 Target update rate 5e-3 Target entropy -1 Action Dim Entropy in Q target False Actor activation function ReLU Generative Model Hyperparameters Num Gaussians 5 Optimizer Adam [Kingma and Ba, 2014] Learning rate (1e-3, 1e-4, 1e-5) Learning rate schedule Piecewise linear (0, 8e5, 9e5) Target entropy -1 Action Dim Generative Model Architecture Hidden dim 256 Hidden layers 2 Activation function ReLU Fisher-BRC Hyperparameters Gradient penalty λ 0.1 Reward bonus 5 We use the default hyperparameters in the Fisher-BRC GitHub. A concern of TD3+BC is the poor performance on random data.
Neural Information Processing Systems
Nov-15-2025, 09:17:12 GMT