Swapped goal-conditioned offline reinforcement learning

Yang, Wenyan, Wang, Huiling, Cai, Dingding, Pajarinen, Joni, Kämäräinen, Joni-Kristen

Feb-17-2023–arXiv.org Artificial Intelligence

Offline goal-conditioned reinforcement learning (GCRL) can be challenging due to overfitting to the given dataset. To generalize agents' skills outside the given dataset, we propose a goal-swapping procedure that generates additional trajectories. To alleviate the problem of noise and extrapolation errors, we present a general offline reinforcement learning method called deterministic Q-advantage policy gradient (DQAPG). In the experiments, DQAPG outperforms state-of-the-art goal-conditioned offline RL methods in a wide range of benchmark tasks, and goal-swapping further improves the test results. It is noteworthy, that the proposed method obtains good performance on the challenging dexterous in-hand manipulation tasks for which the prior methods failed.

machine learning, reinforcement, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

Feb-17-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Finland (0.14)

Genre:
- Research Report > Experimental Study (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found