On the Impact of Noise in Differentially Private Text Rewriting

Meisenbacher, Stephen, Chevli, Maulik, Matthes, Florian

Jan-31-2025–arXiv.org Artificial Intelligence

The field of text privatization often leverages the notion of $\textit{Differential Privacy}$ (DP) to provide formal guarantees in the rewriting or obfuscation of sensitive textual data. A common and nearly ubiquitous form of DP application necessitates the addition of calibrated noise to vector representations of text, either at the data- or model-level, which is governed by the privacy parameter $\varepsilon$. However, noise addition almost undoubtedly leads to considerable utility loss, thereby highlighting one major drawback of DP in NLP. In this work, we introduce a new sentence infilling privatization technique, and we use this method to explore the effect of noise in DP text rewriting. We empirically demonstrate that non-DP privatization techniques excel in utility preservation and can find an acceptable empirical privacy-utility trade-off, yet cannot outperform DP methods in empirical privacy protections. Our results highlight the significant impact of noise in current DP rewriting mechanisms, leading to a discussion of the merits and challenges of DP in NLP, as well as the opportunities that non-DP methods present.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Jan-31-2025

arXiv.org PDF

Add feedback

Country:
- Asia (1.00)
- Europe > United Kingdom
  - England (0.46)
- North America > United States
  - Washington > King County > Seattle (0.14)

Genre:
- Research Report > New Finding (0.87)

Industry:
- Government (1.00)
- Information Technology > Security & Privacy (1.00)
- Leisure & Entertainment > Sports
  - Soccer (1.00)
- Media > Music (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Neural Networks
      - Deep Learning (0.46)
    - Natural Language
      - Generation (0.46)
      - Large Language Model (0.68)
      - Text Processing (0.46)
  - Communications > Social Media (0.93)
  - Security & Privacy (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found