Disentangling the Linguistic Competence of Privacy-Preserving BERT
Arnold, Stefan, Kemmerzell, Nils, Schreiner, Annika
–arXiv.org Artificial Intelligence
Differential Privacy (DP) has been tailored to address the unique challenges of text-to-text privatization. However, text-to-text privatization is known for degrading the performance of language models when trained on perturbed text. Employing a series of interpretation techniques on the internal representations extracted from BERT trained on perturbed pre-text, we intend to disentangle at the linguistic level the distortion induced by differential privacy. Experimental results from a representational similarity analysis indicate that the overall similarity of internal representations is substantially reduced. Using probing tasks to unpack this dissimilarity, we find evidence that text-to-text privatization affects the linguistic competence across several formalisms, encoding localized properties of words while falling short at encoding the contextual relationships between spans of words.
arXiv.org Artificial Intelligence
Oct-17-2023
- Country:
- North America
- United States > Washington
- King County > Seattle (0.04)
- Canada > Ontario
- Toronto (0.04)
- United States > Washington
- Europe > Germany
- Bavaria > Middle Franconia > Nuremberg (0.04)
- North America
- Genre:
- Research Report (0.64)
- Industry:
- Information Technology > Security & Privacy (0.46)
- Technology: