Mars-PO: Multi-Agent Reasoning System Preference Optimization