Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths