Reliable Off-policy Evaluation for Reinforcement Learning
Wang, Jie, Gao, Rui, Zha, Hongyuan
Reinforcement learning (RL) has achieved phenomenal success in games and robotics [,, ] in the past decade, which also stimulates the enthusiasm of extending these techniques in other areas including healthcare [, ], education [ ], autonomous driving [ ], recommendation systems [, ], etc. One of the major challenges in applying RL to these real-world applications, especially those involve high-stake environments, is the problem of o -policy evaluation (OPE): how one can evaluate a new policy before deployment, using only historical data collected from a di erent policy, known as the behavior policy. Indeed, for many practical applications, one may not have a faithful simulator of the domain from which su cient amount of data can be exploited to train the RL system, and it may not always be feasible to try out a new policy without causing unintended harms. For example, consider the problem of finding the best treatment plan for a patient, or testing the performance of an automated driving system, or suggesting a personalized curriculum for a student. In those tasks, conducting experimentation involves interactions with real people, thus it can be costly to collect data and even worse, a bad policy can be risky or unethical and may result in severe consequences. Therefore, it is important for the RL system to have the ability to predict how well a new policy would perform without having to deploy it first. While most existing works on OPE aim to provide accurate point estimates for short-horizon problems [,,, ] as well as long-or infinite-horizon problems [,,,,, ], it is equally important to quantify the uncertainty of the OPE point estimates for both safe exploration and optimistic planning.
Nov-8-2020
- Country:
- North America > United States
- Texas > Travis County
- Austin (0.04)
- New York > New York County
- New York City (0.04)
- California > San Francisco County
- San Francisco (0.14)
- Texas > Travis County
- Asia > China
- Guangdong Province > Shenzhen (0.04)
- Hong Kong (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine (1.00)
- Information Technology > Robotics & Automation (0.54)
- Transportation > Ground
- Road (0.54)
- Technology: