Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent

Chen, Xi, Lai, Zehua, Li, He, Zhang, Yichen

arXiv.org Artificial Intelligence 

As the agent's choice is often influenced by additional covariates, also referred to as contexts, contextual bandit problems have gained renewed attention in the past decades (Woodroofe, 1979; Langford and Zhang, 2007, etc.). With the development of internet and data technology, contextual bandit algorithms play an important role in sequential decision-making applications, such as online advertisement (Li et al., 2010), precision medicine (Kim et al., 2011), e-commence (Qiang and Bayati, 2016; Chen et al., 2022), and public policy (Kasy and Sautmann, 2021). Such decisions are often referred to as recommendations, treatments, interventions, and public orders, while the rewards can be healthcare outcomes, welfare utility, revenue as well as any measure of satisfaction of decisions. Most contextual bandit algorithms are built with the goal of learning the best action under different contexts. In sequential settings, it is often formulated as minimizing the expected cumulative regret that the practitioner would have received if she knows the optimal action. While the importance of this regret minimization is undisputed, reliable uncertainty quantification of the learned decision rule is evidently important in many featured applications. For example, in a personalized medicine application where the intervention decision is to choose t''he best medical treatment to optimize some health outcome, the risk for the selected treatment plays a critical and even sometimes life-threatening role in decision-making. Such examples call for the crucial need for a valid and reliable statistical inference procedure accompanying the decision-making process to provide guidance on policy interventions. Inferential studies help not only prompt risk alerts in recommendations, but also gain scientific knowledge of questions such as the effectiveness of medicines.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found