Implicit Distributional Reinforcement Learning: Appendix A Proof of Lemma 1 Denote H = E a π log π

Jan-24-2025, 08:03:53 GMT–Neural Information Processing Systems

Additional ablation studies on Ant is shown in Figure 1a for a thorough comparison. In Ant, the performance of IDAC is on par with that of IDAC-Gaussian, which outperforms the other variants. Furthermore, we would like to learn the interaction between DGN and SIA by running ablation studies by holding each of them as a control factor; we conduct the corresponding experiments on Walker2d. From Figure 1b, we can observe that by removing either SIA (resulting in IDAC-Gaussian) or DGN (resulting in IDAC-noDGN) from IDAC in general negatively impacts its performance, which echoes our motivation that we integrate DGN and SIA to allow them to help strengthen each other: (i) Modeling G exploits distributional information to help better estimate its mean Q (note C51, which outperforms DQN by exploiting distributional information, also conducts its argmax operation on Q); (ii) A more flexible policy may become more necessary given a better estimated Q. In Figure 1, we include a thorough comparison with SDPG (implemented based on the stable baselines codebase).

artificial intelligence, implicit distributional reinforcement learning, machine learning, (11 more...)

Neural Information Processing Systems

Jan-24-2025, 08:03:53 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)