RL Grokking Recipe: How Does RL Unlock and Transfer New Algorithms in LLMs?

Open in new window