Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn

Neural Information Processing Systems 

Network outputs can change indirectly to unexpected values after any random batch update for input data not included in the batch, called churn in this paper.