VPO: Reasoning Preferences Optimization Based on \mathcal{V} -Usable Information

Open in new window