Trajectory Bellman Residual Minimization: ASimple Value-Based Method for LLMReasoning

Open in new window