Pessimistic Off-Policy Optimization for Learning to Rank

Open in new window