Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

Open in new window