A Bayesian Approach to Online Learning for Contextual Restless Bandits with Applications to Public Health

Liang, Biyonka, Xu, Lily, Taneja, Aparna, Tambe, Milind, Janson, Lucas

arXiv.org Artificial Intelligence 

In these settings, such as communicable disease management (Tuldrà et al., the underlying transition dynamics are often unknown 1999; Killian et al., 2019), prenatal and infant care (Hegde a priori, requiring online reinforcement & Doshi, 2016; Ope, 2020; Bashingwa et al., 2021), and learning (RL). However, existing methods in online cancer prevention (Wells et al., 2011; Lee et al., 2019), beneficiaries RL for RMABs cannot incorporate properties may at any time enter an adhering (e.g., following often present in real-world public health applications, their treatment regimen) or non-adhering (e.g., missing a such as contextual information and treatment) state. As adherence is often vital for ensuring non-stationarity. We present Bayesian Learning certain health outcomes, programs may allocate resources for Contextual RMABs (BCoR), an online or interventions to patients at risk of drop-out from the program RL approach for RMABs that novelly combines due to continued non-adherence. We can model this techniques in Bayesian modeling with Thompson problem as an RMAB by representing each beneficiary as an sampling to flexibly model a wide range of arm, their adherence status as the state of the corresponding complex RMAB settings, such as contextual and MDP, and the allocation of an intervention as the action.