Proximal Ranking Policy Optimization for Practical Safety in Counterfactual Learning to Rank

Open in new window