Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values