Off-policy Evaluation in Doubly Inhomogeneous Environments
Bian, Zeyu, Shi, Chengchun, Qi, Zhengling, Wang, Lan
–arXiv.org Artificial Intelligence
Reinforcement learning (RL, Sutton and Barto, 2018) aims to optimize an agent's long-term reward by learning an optimal policy that determines the best action to take under every circumstance. RL is closely related to the dynamic treatment regimens (DTR) or adaptive treatment strategies in statistical research for precision medicine (Murphy, 2003; Robins, 2004; Qian and Murphy, 2011; Kosorok and Moodie, 2015; Tsiatis et al., 2019; Qi et al., 2020; Zhou et al., 2022a), which seeks to obtain the optimal treatment policy in finite horizon settings with a few treatment stages that maximizes patients' expected outcome. Nevertheless, statistical methods for DTR mentioned above normally cannot handle large or infinite horizon settings. They require the number of trajectories to tend to infinity to achieve estimation consistency, unlike RL, which works even with finite number of trajectories under certain conditions. In addition to precision medicine, RL has been applied to various fields, such as games (Silver et al., 2016), ridesharing (Xu et al., 2018), mobile health (Liao et al., 2021) and robotics (Levine et al., 2020).
arXiv.org Artificial Intelligence
Sep-7-2023
- Country:
- Asia > Middle East
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Florida > Palm Beach County
- Boca Raton (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- New York (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- Florida > Palm Beach County
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine
- Pharmaceuticals & Biotechnology (0.93)
- Therapeutic Area (0.68)
- Transportation > Ground
- Road (0.34)
- Health & Medicine
- Technology: