Classification with Partially Private Features

Shen, Zeyu, Krishnaswamy, Anilesh, Kulkarni, Janardhan, Munagala, Kamesh

arXiv.org Artificial Intelligence 

Privacy of data has become increasingly important in large-scale machine learning applications, where data points correspond to individuals that seek privacy. Classifiers are often trained over data of individuals with sensitive attributes about individuals: income, education, marital status, and so on for instance. A wellaccepted way to incorporate privacy into machine learning is the framework of differential privacy [9, 7, 10]. The key idea is to add noise either to individual data items or to the output of the classifier so that the distribution of classifiers produced is mathematically close when an arbitrary individual is added or removed from the dataset. This provides a quantifiable way in which sensitive data about any individual is information theoretically secure during the classification process. All this comes at a cost: Adding noise leads to a loss in accuracy of the classifier, and as we elaborate below, a large body of work has studied the privacy-accuracy trade-off both theoretically and empirically.