Training Natural Language Processing Models on Encrypted Text for Enhanced Privacy
Tasar, Davut Emre, Tasar, Ceren Ocal
–arXiv.org Artificial Intelligence
With the increasing use of cloud-based services for training and deploying machine learning models, data privacy has become a major concern. This is particularly important for natural language processing (NLP) models, which often process sensitive information such as personal communications and confidential documents. In this study, we propose a method for training NLP models on encrypted text data to mitigate data privacy concerns while maintaining similar performance to models trained on non-encrypted data. We demonstrate our method using two different architectures, namely Doc2Vec+XGBoost and Doc2Vec+LSTM, and evaluate the models on the 20 Newsgroups dataset. Our results indicate that both encrypted and non-encrypted models achieve comparable performance, suggesting that our encryption method is effective in preserving data privacy without sacrificing model accuracy. In order to replicate our experiments, we have provided a Colab notebook at the following address: https://t.ly/lR-TP
arXiv.org Artificial Intelligence
May-2-2023
- Country:
- Asia
- India (0.05)
- Middle East > Republic of Türkiye
- Karabuk Province > Karabuk (0.06)
- İzmir Province > İzmir (0.05)
- Asia
- Genre:
- Research Report > New Finding (0.88)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: