Privacy Distillation: Reducing Re-identification Risk of Multimodal Diffusion Models

Fernandez, Virginia, Sanchez, Pedro, Pinaya, Walter Hugo Lopez, Jacenków, Grzegorz, Tsaftaris, Sotirios A., Cardoso, Jorge

Jun-2-2023–arXiv.org Artificial Intelligence

Knowledge distillation in neural networks refers to compressing a large model or dataset into a smaller version of itself. We introduce Privacy Distillation, a framework that allows a text-to-image generative model to teach another model without exposing it to identifiable data. Here, we are interested in the privacy issue faced by a data provider who wishes to share their data via a multimodal generative model. A question that immediately arises is ``How can a data provider ensure that the generative model is not leaking identifiable information about a patient?''. Our solution consists of (1) training a first diffusion model on real data (2) generating a synthetic dataset using this model and filtering it to exclude images with a re-identifiability risk (3) training a second diffusion model on the filtered synthetic data only. We showcase that datasets sampled from models trained with privacy distillation can effectively reduce re-identification risk whilst maintaining downstream performance.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Jun-2-2023

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom (0.14)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine > Diagnostic Medicine
  - Imaging (1.00)
- Information Technology > Security & Privacy (0.86)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found