Disentangling the Linguistic Competence of Privacy-Preserving BERT

Arnold, Stefan, Kemmerzell, Nils, Schreiner, Annika

Oct-17-2023–arXiv.org Artificial Intelligence

Differential Privacy (DP) has been tailored to address the unique challenges of text-to-text privatization. However, text-to-text privatization is known for degrading the performance of language models when trained on perturbed text. Employing a series of interpretation techniques on the internal representations extracted from BERT trained on perturbed pre-text, we intend to disentangle at the linguistic level the distortion induced by differential privacy. Experimental results from a representational similarity analysis indicate that the overall similarity of internal representations is substantially reduced. Using probing tasks to unpack this dissimilarity, we find evidence that text-to-text privatization affects the linguistic competence across several formalisms, encoding localized properties of words while falling short at encoding the contextual relationships between spans of words.

arxiv preprint arxiv, linguistic property, representation, (12 more...)

arXiv.org Artificial Intelligence

Oct-17-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Washington
    - King County > Seattle (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe > Germany
  - Bavaria > Middle Franconia > Nuremberg (0.04)

Genre:
- Research Report (0.64)

Industry:
- Information Technology > Security & Privacy (0.46)

Technology:
- Information Technology
  - Data Science > Data Mining (0.83)
  - Artificial Intelligence
    - Natural Language
      - Grammars & Parsing (0.71)
      - Machine Translation (0.68)
    - Machine Learning > Neural Networks
      - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found