The Provable Benefits of Unsupervised Data Sharing for Offline Reinforcement Learning

Hu, Hao, Yang, Yiqin, Zhao, Qianchuan, Zhang, Chongjie

Feb-26-2023–arXiv.org Artificial Intelligence

Self-supervised methods have become crucial for advancing deep learning by leveraging data itself to reduce the need for expensive annotations. However, the question of how to conduct self-supervised offline reinforcement learning (RL) in a principled way remains unclear. In this paper, we address this issue by investigating the theoretical benefits of utilizing reward-free data in linear Markov Decision Processes (MDPs) within a semi-supervised setting. Further, we propose a novel, Provable Data Sharing algorithm (PDS) to utilize such reward-free data for offline RL. PDS uses additional penalties on the reward function learned from labeled data to prevent overestimation, ensuring a conservative algorithm. Our results on various offline RL tasks demonstrate that PDS significantly improves the performance of offline RL algorithms with reward-free data. Overall, our work provides a promising approach to leveraging the benefits of unlabeled data in offline RL while maintaining theoretical guarantees. We believe our findings will contribute to developing more robust self-supervised RL methods. Offline reinforcement learning (RL) is a promising framework for learning sequential policies with pre-collected datasets.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

Feb-26-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Germany
  - Baden-Württemberg > Freiburg (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China (0.04)

Genre:
- Research Report > New Finding (0.86)

Industry:
- Leisure & Entertainment > Games > Computer Games (0.93)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found