Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data

Madhow, Sunil, Xiao, Dan, Yin, Ming, Wang, Yu-Xiang

Jun-24-2023–arXiv.org Artificial Intelligence

Offline Reinforcement Learning (RL), which seeks to perform standard RL tasks using a pre-existing dataset of interactions with an MDP, is a key frontier in the effort to make RL methods more widely applicable. The ability to incorporate existing data into RL algorithms is crucial in many promising application domains. In safety-critical areas, such as autonomous driving (Kiran et al., 2020), the randomized exploration that characterizes online algorithms is not ethically tolerable. Even in lower-stakes applications, such as advertising (Cai et al., 2017), naively adopting online algorithms could mean throwing away vast reserves of previouslycollected data. The development of efficient offline algorithms promises to broaden RL's applicability by allowing practitioners to exercise some much needed domain-specific control over the training process. Given a dataset, D, of interactions with an MDP M, two tasks that we may hope to achieve in offline RL are Offline Policy Evaluation (Yin & Wang, 2020) and Offline Learning (Lange et al., 2012). In Offline Policy Evaluation (OPE), we seek to estimate the value of a target policy π under M. In Offline Learning (OL), the goal is to use D to find a good policy π Π where Π is some policy class. In this paper, we largely focus on OPE.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

Jun-24-2023

arXiv.org PDF

Add feedback

Country:
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine (0.69)
- Information Technology > Robotics & Automation (0.34)
- Transportation > Ground
  - Road (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found