DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models

Open in new window