Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation

Open in new window