TeViR: Text-to-Video Reward with Diffusion Models for Efficient Reinforcement Learning

Chen, Yuhui, Li, Haoran, Jiang, Zhennan, Wen, Haowei, Zhao, Dongbin

Jun-25-2025–arXiv.org Artificial Intelligence

--Developing scalable and generalizable reward engineering for reinforcement learning (RL) is crucial for creating general-purpose agents, especially in the challenging domain of robotic manipulation. While recent advances in reward engineering with Vision-Language Models (VLMs) have shown promise, their sparse reward nature significantly limits sample efficiency. This paper introduces T eViR, a novel method that leverages a pre-trained text-to-video diffusion model to generate dense rewards by comparing the predicted image sequence with current observations. Experimental results across 13 simulation and real-world robotic tasks demonstrate that T eViR outperforms traditional methods leveraging sparse rewards and other state-of-the-art (SOT A) methods, achieving better sample efficiency and performance without ground truth environmental rewards. T eViR's ability to efficiently guide agents in complex environments highlights its potential to advance reinforcement learning applications in robotic manipulation. EVELOPING general-purpose agents with reinforcement learning (RL) necessitates scalable and generalizable reward engineering to provide effective task specifications for downstream policy learning [1]. Reward engineering is crucial as it determines the policies agents can learn and ensures they align with intended objectives. However, the manual design of reward functions often present significant challenges [2]- [4], particularly in robotic manipulation tasks [5]-[8]. This challenge has emerged as a major bottleneck in developing general-purpose agents. Although inverse reinforcement learning (IRL) [9] learns rewards from pre-collected expert demonstration, these learned reward functions are unreliable for learning policies due to noise and misspecification errors [10], especially for robotic manipulation tasks since in-domain data is limited [11]. Additionally, the learned reward functions is not generally applicable across tasks.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

Jun-25-2025

arXiv.org PDF

Add feedback

Country:
- Asia > China > Beijing > Beijing (0.04)

Genre:
- Research Report
  - New Finding (0.46)
  - Promising Solution (0.34)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Robots (1.00)