Learning from Mixtures of Private and Public Populations

Neural Information Processing Systems 

We initiate the study of a new model of supervised learning under privacy constraints. Imagine a medical study where a dataset is sampled from a population of both healthy and unhealthy individuals. Suppose healthy individuals have no privacy concerns (in such case, we call their data public'') while the unhealthy individuals desire stringent privacy protection for their data. In this example, the population (data distribution) is a mixture of private (unhealthy) and public (healthy) sub-populations that could be very different. Inspired by the above example, we consider a model in which the population \cD is a mixture of two possibly distinct sub-populations: a private sub-population \Dprv of private and sensitive data, and a public sub-population \Dpub of data with no privacy concerns.