Improving Consequential Decision Making under Imperfect Predictions
Kilbertus, Niki, Gomez-Rodriguez, Manuel, Schölkopf, Bernhard, Muandet, Krikamol, Valera, Isabel
Consequential decisions are increasingly informed by sophisticated data-driven predictive models. For accurate predictive models, deterministic threshold rules have been shown to be optimal in terms of utility, even under a variety of fairness constraints. However, consistently learning accurate models requires access to ground truth data. Unfortunately, in practice, some data can only be observed if a certain decision was taken. Thus, collected data always depends on potentially imperfect historical decision policies. As a result, learned deterministic threshold rules are often suboptimal. We address the above question from the perspective of sequential policy learning. We first show that, if decisions are taken by a faulty deterministic policy, the observed outcomes under this policy are insufficient to improve it. We then describe how this undesirable behavior can be avoided using stochastic policies. Finally, we introduce a practical gradient-based algorithm to learn stochastic policies that effectively leverage the outcomes of decisions to improve over time. Experiments on both synthetic and real-world data illustrate our theoretical results and show the efficacy of our proposed algorithm.
Feb-8-2019
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.14)
- North America > United States
- Florida > Broward County (0.14)
- Europe > United Kingdom
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Banking & Finance (0.93)
- Law (0.93)
- Technology: