$\pi2\text{vec}$: Policy Representations with Successor Features

Scarpellini, Gianluca, Konyushkova, Ksenia, Fantacci, Claudio, Paine, Tom Le, Chen, Yutian, Denil, Misha

Jun-16-2023–arXiv.org Artificial Intelligence

Robot time is an important bottleneck in applying reinforcement learning in real life. The lack of sufficient training data has driven progress in sim2real, offline reinforcement learning (offline RL), and data efficient learning. However, these approaches do not address the data requirements of policy evaluation. Various proxy metrics were introduced to replace the evaluation on the real robotic system. For example, in sim2real we might measure the performance in simulation (Lee et al., 2021), while in offline RL we can rely on Off-policy Policy Evaluation (OPE) methods (Precup, 2000; Li et al., 2011; Gulcehre et al., 2020; Fu et al., 2021) As we are usually interested in deploying a policy in the real world, recent works narrowed the problem by focusing on Offline Policy Selection (OPS), where the goal is picking the best performing policy from offline data. While these methods are useful for determining coarse relative performance of policies, one still needs time on real robot for more reliable estimates.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

Jun-16-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Robots (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found