Internalizing World Models via Self-Play Finetuning for Agentic RL

Open in new window