RL Tango: Reinforcing Generator and Verifier Together for Language Reasoning

Open in new window