Goto

Collaborating Authors

 bellemare







Small batch deep reinforcement learning

Neural Information Processing Systems

Since the policy used to collect transitions is changing throughout learning, the replay memory contains data coming from a mixture of policies (that differ from the agent's current policy), and