Clip-Low Increases Entropy and Clip-High Decreases Entropy in Reinforcement Learning of Large Language Models

Open in new window