ORPO-Distill: Mixed-Policy Preference Optimization for Cross-Architecture LLM Distillation

Singh, Aasheesh, Vaddina, Vishal, Birru, Dagnachew

Sep-30-2025–arXiv.org Artificial Intelligence

We introduce ORPO-Distill, a general-purpose method for cross-architecture LLM distillation that formulates the problem as a preference optimization task. Unlike standard CoT distillation, the approach transfers knowledge through diverse reasoning traces. It employs an Odds-Ratio Preference Optimization objective that contrasts teacher and student traces for more effective learning, and adopts a mixed-policy strategy for utilizing student-generated outputs, outperforming both off- and on-policy alternatives. Experiments on five datasets and multiple student models show consistent improvements over conventional black-box KD baselines.

distillation, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

Sep-30-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada (0.16)
  - United States (0.15)

Genre:
- Research Report (0.40)

Industry:
- Education (0.55)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)