Mix- and MoE-DPO: A Variational Inference Approach to Direct Preference Optimization

Open in new window