Enhancing Multimodal LLM for Detailed and Accurate Video Captioning using Multi-Round Preference Optimization