Privacy Preserving Off-Policy Evaluation
Xie, Tengyang, Thomas, Philip S., Miklau, Gerome
Many proposed applications of reinforcement learning (RL) involve the use of data that could contain sensitive information. For example, Raghu et al. [2017] proposed an application of RL and off-policy evaluation methods that uses peoples' medical records, and Theocharous et al. [2015] applied off-policy evaluation methods to user data collected by a bank in order to improve the targeting of advertisements. In examples like these, the data used by the RL systems is sensitive, and one should ensure that the methods applied to the data do not leak any sensitive information. Recently, Balle et al. [2016] showed how techniques from differential privacy can be used to ensure that (with high probability) policy evaluation methods for RL do not leak (much) sensitive information. In this paper we extend their work in two ways.
Jan-31-2019