Off-Policy Evaluation for Sequential Persuasion Process with Unobserved Confounding
S., Nishanth Venkatesh, Bang, Heeseung, Malikopoulos, Andreas A.
–arXiv.org Artificial Intelligence
-- In this paper, we expand the Bayesian persuasion framework to account for unobserved confounding variables in sender-receiver interactions. While traditional models typically assume that belief updates follow Bayesian principles, real-world scenarios often involve hidden variables that impact the receiver's belief formation and decision-making. Crucially, the receiver's belief update is affected by an unobserved confounding variable. By reformulating this scenario as a Partially Observable Markov Decision Process (POMDP), we capture the sender's incomplete information regarding both the dynamics of the receiver's beliefs and the unobserved confounder . We prove that finding an optimal observation-based policy in this POMDP is equivalent to solving for an optimal signaling strategy in the original persuasion framework. Furthermore, we demonstrate how this reformulation facilitates the application of proximal learning for off-policy evaluation (OPE) in the persuasion process. This advancement enables the sender to evaluate alternative signaling strategies using only observational data from a behavioral policy, thus eliminating the necessity for costly new experiments. Strategic information sharing plays a critical role in economic interactions, policy design, and multi-agent systems [1]-[3]. Bayesian persuasion was first introduced by Ka-menica and Gentzkow [4] as a powerful framework for analyzing how a sender can strategically reveal information to influence a receiver's decisions. In the standard setting, a sender commits to an information disclosure policy before observing the state of the world, and the receiver, after observing the sender's message, forms posterior beliefs and takes an action that affects both the sender's and the receiver's utilities. Despite its theoretical elegance, Bayesian persuasion rests on assumptions that may not hold in practical settings. First, the framework presupposes that the sender possesses complete information about the receiver, including their observation process and all features that influence their decision-making (including utility functions).
arXiv.org Artificial Intelligence
Apr-1-2025
- Country:
- North America > United States
- New York > Tompkins County > Ithaca (0.04)
- Europe > Kosovo
- District of Gjilan > Kamenica (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report (0.40)