A Survey on Private Transformer Inference

Li, Yang, Zhou, Xinyu, Wang, Yitong, Qian, Liangxin, Zhao, Jun

arXiv.org Artificial Intelligence 

For instance, both ChatGPT [42] and Bing [40] have made the power of transformer-based models widely accessible, democratizing advanced AI capabilities. These models leverage attention mechanisms [55] adeptly to capture long-range dependencies in sequences of input tokens, allowing them to accurately model contextual information. Besides, unlike traditional task-specific learning approaches, large transformer models (e.g., GPT [46] and BERT [10]) are trained on huge quantities of unlabeled textual data and are directly useful for a wide variety of applications such as sentiment analysis, language translation, content generation, and question answering. However, the application of large transformers still presents certain risks, particularly regarding privacy issues [35, 52]. Most popular transformer models operate in a pattern called Machine Learning as a Service (MLaaS), where a server provides the model and inference services to users who own the data. For instance, OpenAI provides ChatGPT as an online platform and offers remote APIs for developers, allowing users to access services by submitting prompts or messages. Nevertheless, this pattern raises privacy concerns: users need to transmit their private data to a company's server and have no direct control over how their data is handled. They must trust that the server processes the data honestly and follows the agreed terms of service. There exists a risk that the server could misuse the data, including unauthorized processing, storing the data indefinitely, or even selling it to third parties.