Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation