Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR

Open in new window