Reward Generation via Large Vision-Language Model in Offline Reinforcement Learning
Lee, Younghwan, Luu, Tung M., Lee, Donghoon, Yoo, Chang D.
–arXiv.org Artificial Intelligence
Reward Generation via Large Vision-Language Model in Offline Reinforcement Learning Y ounghwan Lee Electrical Engineering KAIST Daejeon, South Korea youngh2@kaist.ac.kr Chang D. Y oo Electrical Engineering KAIST Daejeon, South Korea cd yoo@kaist.ac.kr Abstract --In offline reinforcement learning (RL), learning from fixed datasets presents a promising solution for domains where real-time interaction with the environment is expensive or risky. However, designing dense reward signals for offline dataset requires significant human effort and domain expertise. Reinforcement learning with human feedback (RLHF) has emerged as an alternative, but it remains costly due to the human-in-the-loop process, prompting interest in automated reward generation models. T o address this, we propose Reward Generation via Large Vision-Language Models (RG-VLM), which leverages the reasoning capabilities of L VLMs to generate rewards from offline data without human involvement.
arXiv.org Artificial Intelligence
Apr-15-2025
- Country:
- Asia > South Korea > Daejeon > Daejeon (0.45)
- Genre:
- Research Report > Promising Solution (0.66)
- Technology: