Goto

Collaborating Authors

 approximate allocation matching



Approximate Allocation Matching for Structural Causal Bandits with Unobserved Confounders

Neural Information Processing Systems

Structural causal bandit provides a framework for online decision-making problems when causal information is available. In each round, an agent applies an intervention (or no intervention) by setting certain variables to some constants and receives a stochastic reward from a non-manipulable variable. Though the causal structure is given, the observational and interventional distributions of these random variables are unknown beforehand, and they can only be learned through interactions with the environment. Therefore, to maximize the expected cumulative reward, it is critical to balance the explore-versus-exploit tradeoff. We assume each random variable takes a finite number of distinct values, and consider a semi-Markovian setting, where random variables are affected by unobserved confounders.