Goto

Collaborating Authors

 Asia





Bias Detection via Signaling

Neural Information Processing Systems

We introduce and study the problem of detecting whether an agent is updating their prior beliefs given new evidence in an optimal way that is Bayesian, or whether they are biased towards their own prior. In our model, biased agents form posterior beliefs that are a convex combination of their prior and the Bayesian posterior, where the more biased an agent is, the closer their posterior is to the prior. Since we often cannot observe the agent's beliefs directly, we take an approach inspired by information design . Specifically, we measure an agent's bias by designing a signaling scheme and observing the actions the agent takes in response to different signals, assuming that the agent maximizes their own expected utility. Our goal is to detect bias with a minimum number of signals. Our main results include a characterization of scenarios where a single signal suffices and a computationally efficient algorithm to compute optimal signaling schemes.


Extracting Reward Functions from Diffusion Models

Neural Information Processing Systems

We consider the problem of extracting a reward function by comparing a decision-making diffusion model that models low-reward behavior and one that models high-reward behavior; a setting related to inverse reinforcement learning. We first define the notion of a relative reward function of two diffusion models and show conditions under which it exists and is unique.