Faster Deep Reinforcement Learning with Slower Online Network

Neural Information Processing Systems 

Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge against issues that arise when performing bootstrapping.