AAppendix: Radial-DQN

Feb-11-2026, 11:35:26 GMT–Neural Information Processing Systems

This isnot desirable as a will always overlap with itself and this is not something we wish to regularize. We initially usedc = 0.5 for simplicity, and did not tune as it worked well in our experiments. To understand the sensitivity of our algorithm to this choice ofc, we have conducted additional experiments onthetwogamesRoadrunner andBankHeist . We can see that c = 0.75 produces poor standard performance and we think it's because this requirement is too strict and the policy collapses to some simple policy with lower reward. The standard DQN was trained for 6M steps followed by 4.5M steps ofRADIAL training.

qactor, radial-a3c, radial-dqn, (15 more...)

Neural Information Processing Systems

Feb-11-2026, 11:35:26 GMT

Conferences PDF

Add feedback

Duplicate Docs Excel Report

Title
A Appendix Radial

Similar Docs Excel Report more

Title	Similarity	Source
None found