a8166da05c5a094f7dc03724b41886e5-Supplemental.pdf
–Neural Information Processing Systems
For our specific algorithm, TD3+BC, given the performance gain over existing state-of-the-art methods is minimal, it would be surprising to see our paper result in significant impact in these contexts. ForCQLwemodify the GitHub defaults for the actor learning rate and use a fixedα rather than the Lagrange variant, matching thehyperparameters definedintheirpaper(whichdiffersfromtheGitHub), aswefound theoriginal hyperparameters performed better. We can also chooseλ by considering the value estimate of the agent-if we see divergence in the value function due to extrapolation error [Fujimoto et al., 2019], then we need to decreaseλ such that the BC term is weightedmorehighly. We use the default hyperparameters in the Fisher-BRC GitHub. Figure 1: Percent difference of performance of offline RL algorithms when adding normalization to state features.
Neural Information Processing Systems
Feb-10-2026, 12:44:45 GMT
- Technology: