Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning