Policy Continuation with Hindsight Inverse Dynamics

Hao Sun, Zhizhong Li, Xiaotong Liu, Bolei Zhou, Dahua Lin

Mar-23-2025, 03:59:37 GMT–Neural Information Processing Systems

Solving goal-oriented tasks is an important but challenging problem in reinforcement learning (RL). For such tasks, the rewards are often sparse, making it difficult to learn a policy effectively. To tackle this difficulty, we propose a new approach called Policy Continuation with Hindsight Inverse Dynamics (PCHID). This approach learns from Hindsight Inverse Dynamics based on Hindsight Experience Replay. Enabling the learning process in a self-imitated manner and thus can be trained with supervised learning. This work also extends it to multi-step settings with Policy Continuation. The proposed method is general, which can work in isolation or be combined with other on-policy and off-policy algorithms.

artificial intelligence, machine learning, reinforcement learning, (9 more...)

Neural Information Processing Systems

Mar-23-2025, 03:59:37 GMT

Conferences PDF

Add feedback

Country:
- Asia (0.28)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
Policy Continuation with Hindsight Inverse Dynamics

Similar Docs Excel Report more

Title	Similarity	Source
None found