Taming the Tail: Stable LLM Reinforcement Learning via Dynamic Vocabulary Pruning

Open in new window