c3e0c62ee91db8dc7382bde7419bb573-Supplemental.pdf

Neural Information Processing Systems 

Theactiveagent trains (as a regular Double-DQN) up to the time of forking, at which point the passive agent is created asa'fork' (i.e.,with identical networkweights) oftheactiveagent.