Optimal and Fair Encouragement Policy Evaluation and Learning

Neural Information Processing Systems 

In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal treatment assignments are merely suggestions when humans make the final treatment decisions. On the other hand, there can be different heterogeneity in both the actual response to treatment and final treatment decisions given recommendations. For example, in social services, a persistent puzzle is the gap in take-up of beneficial services among those who may benefit from them the most. When decision-makers have equity-for fairness-minded preferences over both access and average outcomes, the optimal decision rule changes due to these differing heterogeneity patterns. We study identification and improved/robust estimation under potential violations of positivity. We consider fairness constraints such as demographic parity in treatment take-up, and other constraints, via constrained optimization. We develop a two-stage, online learning-based algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds. Our framework can be extended to handle algorithmic recommendations under an often-reasonable covariate-conditional exclusion restriction, using our robustness checks for lack of positivity in the recommendation.