TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance
Liu, Yuyang, Wen, Chuan, Hu, Yihang, Jayaraman, Dinesh, Gao, Yang
–arXiv.org Artificial Intelligence
Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensive manual effort and lacks scalability. One promising solution is to view task progress as a dense reward signal, as it quantifies the degree to which actions advance the system toward task completion over time. We present TimeRewarder, a simple yet effective reward learning method that derives progress estimation signals from passive videos, including robot demonstrations and human videos, by modeling temporal distances between frame pairs. We then demonstrate how TimeRewarder can supply step-wise proxy rewards to guide reinforcement learning. In our comprehensive experiments on ten challenging Meta-World tasks, we show that TimeRewarder dramatically improves RL for sparse-reward tasks, achieving nearly perfect success in 9/10 tasks with only 200,000 interactions per task with the environment. This approach outperformed previous methods and even the manually designed environment dense reward on both the final success rate and sample efficiency. Moreover, we show that TimeRewarder can exploit real-world human videos, highlighting its potential as a scalable approach path to rich reward signals from diverse video sources. Mirroring how humans infer task progression by observing others, TimeRewarder distills frame-wise temporal distances from expert videos and converts them into dense reward signals, thereby enabling reinforcement learning free of manually engineered rewards or action annotations. Reinforcement learning (RL) has long served as a principal paradigm for robotic skill acquisition (Ibarz et al., 2021; Tang et al., 2025).
arXiv.org Artificial Intelligence
Oct-1-2025
- Country:
- Asia > China
- North America > United States
- Pennsylvania (0.04)
- Genre:
- Research Report (1.00)
- Technology: