TROLL: Trust Regions improve Reinforcement Learning for Large Language Models