1-Diffractor: Efficient and Utility-Preserving Text Obfuscation Leveraging Word-Level Metric Differential Privacy

Meisenbacher, Stephen, Chevli, Maulik, Matthes, Florian

May-2-2024–arXiv.org Artificial Intelligence

The study of privacy-preserving Natural Language Processing (NLP) has gained rising attention in recent years. One promising avenue studies the integration of Differential Privacy in NLP, which has brought about innovative methods in a variety of application settings. Of particular note are $\textit{word-level Metric Local Differential Privacy (MLDP)}$ mechanisms, which work to obfuscate potentially sensitive input text by performing word-by-word $\textit{perturbations}$. Although these methods have shown promising results in empirical tests, there are two major drawbacks: (1) the inevitable loss of utility due to addition of noise, and (2) the computational expensiveness of running these mechanisms on high-dimensional word embeddings. In this work, we aim to address these challenges by proposing $\texttt{1-Diffractor}$, a new mechanism that boasts high speedups in comparison to previous mechanisms, while still demonstrating strong utility- and privacy-preserving capabilities. We evaluate $\texttt{1-Diffractor}$ for utility on several NLP tasks, for theoretical and task-based privacy, and for efficiency in terms of speed and memory. $\texttt{1-Diffractor}$ shows significant improvements in efficiency, while still maintaining competitive utility and privacy scores across all conducted comparative tests against previous MLDP mechanisms. Our code is made available at: https://github.com/sjmeis/Diffractor.

computational linguistic, differential privacy, mechanism, (12 more...)

arXiv.org Artificial Intelligence

May-2-2024

arXiv.org PDF

Add feedback

Country:
- South America > Colombia
  - Meta Department > Villavicencio (0.04)
- North America
  - United States
    - Washington > King County
      - Seattle (0.04)
    - Texas > Harris County
      - Houston (0.04)
    - New York > New York County
      - New York City (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - California > Los Angeles County
      - Los Angeles (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Austria > Vienna (0.14)
  - Czechia > Prague (0.04)
  - Germany > Bavaria
    - Upper Bavaria > Munich (0.04)
  - Slovenia > Central Slovenia
    - Municipality of Ljubljana > Ljubljana (0.04)
  - Portugal > Porto
    - Porto (0.05)
  - Italy > Tuscany
    - Florence (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Singapore (0.04)
  - Indonesia > Bali (0.04)
  - China > Hong Kong (0.04)
  - Middle East
    - Jordan (0.04)
    - Qatar > Ad-Dawhah
      - Doha (0.04)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Information Technology > Security & Privacy (1.00)
- Media (0.69)
- Leisure & Entertainment (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found