DP-NMT: Scalable Differentially-Private Machine Translation

Igamberdiev, Timour, Vu, Doan Nam Long, Künnecke, Felix, Yu, Zhuo, Holmer, Jannik, Habernal, Ivan

Nov-24-2023–arXiv.org Artificial Intelligence

Neural machine translation (NMT) is a widely popular text generation task, yet there is a considerable research gap in the development of privacy-preserving NMT models, despite significant data privacy concerns for NMT systems. Differentially private stochastic gradient descent (DP-SGD) is a popular method for training machine learning models with concrete privacy guarantees; however, the implementation specifics of training a model with DP-SGD are not always clarified in existing models, with differing software libraries used and code bases not always being public, leading to reproducibility issues. To tackle this, we introduce DP-NMT, an open-source framework for carrying out research on privacy-preserving NMT with DP-SGD, bringing together numerous models, datasets, and evaluation metrics in one systematic software package. Our goal is to provide a platform for researchers to advance the development of privacy-preserving NMT systems, keeping the specific details of the DP-SGD algorithm transparent and intuitive to implement. We run a set of experiments on datasets from both general and privacy-related domains to demonstrate our framework in use. We make our framework publicly available and welcome feedback from the community.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Nov-24-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States (0.28)

Genre:
- Research Report (0.64)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.54)
  - Natural Language > Machine Translation (1.00)