Aligning Large Vision-Language Models by Deep Reinforcement Learning and Direct Preference Optimization

Open in new window