Understanding Training-Data Leakage from Gradients in Neural Networks for Image Classification

Chen, Cangxiong, Campbell, Neill D. F.

arXiv.org Machine Learning 

In federated learning [6] of deep learning models for supervised tasks such as image classification and segmentation, gradients from each participant are shared either with another participant or are aggregated at a central server. In many applications of federated learning, the privacy of the training data will need to be protected and we want to obtain guarantees that a malicious participant will not be able to recover fully the training data from other participants, with shared gradients and knowledge of the model architecture. The guarantee will be indispensable in removing the barriers for applying federated learning in tasks such as image segmentations in film post-production where the training data are usually under strict IP protections. In this scenario, it is the training data that needs to be protected, rather than the information we can infer about them. In order to develop protection mechanisms, an appropriate understanding of the source of leakage of the training data is needed. For this work, we are concerned with the following question: for a deep learning model performing image classifications, what determines the success of reconstructing the training data given its label, its gradients from training, and the model architecture? We will focus on the case when we aim to reconstruct a single target image with an untrained model. Although our work was inspired by R-GAP [10], our method COPA (combined optimisation attack) provides a more general theoretical framework to training-data reconstructions, particularly for convolutional layers. Compared with DLG [11], COPA provides more insight to the mechanism of training-data leakage through a more informative formulation of the objective function, making it clearer the source of constraints.