Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values

Open in new window