Uncertainty-Aware Instance Reweighting for Off-Policy Learning

Neural Information Processing Systems 

IPS is the probability ratio of taking an action between the learning and logging policies.