a8166da05c5a094f7dc03724b41886e5-Supplemental.pdf

Feb-10-2026, 12:44:45 GMT–Neural Information Processing Systems

For our specific algorithm, TD3+BC, given the performance gain over existing state-of-the-art methods is minimal, it would be surprising to see our paper result in significant impact in these contexts. ForCQLwemodify the GitHub defaults for the actor learning rate and use a fixedα rather than the Lagrange variant, matching thehyperparameters definedintheirpaper(whichdiffersfromtheGitHub), aswefound theoriginal hyperparameters performed better. We can also chooseλ by considering the value estimate of the agent-if we see divergence in the value function due to extrapolation error [Fujimoto et al., 2019], then we need to decreaseλ such that the BC term is weightedmorehighly. We use the default hyperparameters in the Fisher-BRC GitHub. Figure 1: Percent difference of performance of offline RL algorithms when adding normalization to state features.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Feb-10-2026, 12:44:45 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)

Duplicate Docs Excel Report

Title
a8166da05c5a094f7dc03724b41886e5-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found