Learning to Reason as Action Abstractions with Scalable Mid-Training RL

Open in new window