DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models