Goto

Collaborating Authors

 state-action pair









Appendix to Weakly Coupled Deep Q-Networks A Proofs

Neural Information Processing Systems

We prove part the first part of the proposition (weak duality) by induction. It is well-known that, by the value iteration algorithm's convergence, Q Consider a state s S and a feasible action a A (s). We use an induction proof. B (w), which follows by the convergence of value iteration.A.2 Proof of Theorem 1 Proof. Now we state the following lemma.



A Hyperparameter Settings of RD

Neural Information Processing Systems

In this section, we describe details about hyperparameter setting of RD. SAC-N-Unc and TD3-N-Unc, M is set to 1/10 of the total training steps. To ensure fairness, algorithms employing RD are implemented using CORL repository [54]. By modifying the original SAC/TD3 algorithm to employ a critic ensemble of number N and incorporate an uncertainty regularization term within the policy update process, we derive these backbone algorithms. Additionally, using RD with fewer Q ensembles can achieve similar or even better results than the backbone methods using more Q ensembles, indicating its potential in reducing computing resource consumption.