CoinDICE: Off-Policy Confidence Interval Estimation

Neural Information Processing Systems 

One of the major barriers that hinders the application of reinforcement learning (RL) is the ability to evaluate new policies reliably before deployment, a problem generally known as off-policy evaluation (OPE).