Goto

Collaborating Authors

 causal diagram




On Transportability for Structural Causal Bandits

Park, Min Woo, Lee, Sanghack

arXiv.org Machine Learning

Intelligent agents equipped with causal knowledge can optimize their action spaces to avoid unnecessary exploration. The structural causal bandit framework provides a graphical characterization for identifying actions that are unable to maximize rewards by leveraging prior knowledge of the underlying causal structure. While such knowledge enables an agent to estimate the expected rewards of certain actions based on others in online interactions, there has been little guidance on how to transfer information inferred from arbitrary combinations of datasets collected under different conditions -- observational or experimental -- and from heterogeneous environments. In this paper, we investigate the structural causal bandit with transportability, where priors from the source environments are fused to enhance learning in the deployment setting. We demonstrate that it is possible to exploit invariances across environments to consistently improve learning. The resulting bandit algorithm achieves a sub-linear regret bound with an explicit dependence on informativeness of prior data, and it may outperform standard bandit approaches that rely solely on online learning.





Limitation of intervention not changing parent set: There are many settings in the empirical sciences where

Neural Information Processing Systems

We would like to thank the reviewers for their comments and constructive feedback. Below, we address the main issues raised and clarify some misunderstandings. Also, the work of Y ang et al. (2018) characterizes soft interventions in systems without latent variables. Mooij et al. (2013) discussed interventions of this nature in the context of equilibrium in cyclic causal models. Usage of MAGs: The reviewer's observation only holds for hard interventions.




evaluations overly harsh and would ask reviewers to reconsider our paper in the light of clarifications provided below. 2

Neural Information Processing Systems

We thank the reviewers for their thoughtful feedback. We organize the reviewer's questions (Q) and provide answers below. Point #6 clarifies questions in "Correctness". P (Y = 0 | do( π)) = 0 .5 We appreciate the suggested references, but they are somewhat orthogonal to our problem.