Robust On-Policy Sampling for Data-Efficient Policy Evaluation in Reinforcement Learning

Neural Information Processing Systems 

On-policy algorithms learn about a particular target policy using data collected by behaving according to the target policy.