Supplementary Policy
–Neural Information Processing Systems
Let t(s, a)= Q(s, a) ˆQ (s, a)andFt(s, a)= rpeer+ maxb2 AQ(s0,b) ˆQ (s, a). In(A4), we robust DQNalgorithmwithpeersampling, inwhichtheoriginlossis`((s, a), y), also calibrated.
Neural Information Processing Systems
Feb-10-2026, 11:46:41 GMT