Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

Open in new window