Escaping the Verifier: Learning to Reason via Demonstrations