Small batch deep reinforcement learning

Neural Information Processing Systems 

Since the policy used to collect transitions is changing throughout learning, the replay memory contains data coming from a mixture of policies (that differ from the agent's current policy), and

Similar Docs  Excel Report  more

TitleSimilaritySource
None found