MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization

Open in new window