Policy-Gradient Training of Language Models for Ranking