Hidden You Malicious Goal Into Benign Narratives: Jailbreak Large Language Models through Logic Chain Injection
Wang, Zhilong, Cao, Yebo, Liu, Peng
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) such as BERT [6] (Bidirectional Encoder Representations from Transformers) by Devlin et al. and GPT [11] (Generative Pre-trained Transformer) by Radford et al., have revolutionized the field of Natural Language Processing (NLP) with their exceptional capabilities, setting new standards in performance across various tasks. Due to their superb generative capability, LLMs are widely deployed as the backend for various real-world applications, referred to as LLM-Integrated Applications. For instance, Microsoft utilizes GPT-4 as the service backend for the new Bing Search [1]; OpenAI has developed various applications--such as ChatWithPDF and AskTheCode--that utilize GPT-4 for different tasks such as text processing, code interpretation, and product recommendation [2, 3]; Google deploys the search engine Bard, powered by PaLM 2. In general, to accomplish a task, an LLM-Integrated Application requires an instruction prompt, which aims to instruct the backend LLM to perform the task, and a data prompt, which is the data to be processed by the LLM in the task. The instruction prompt can be provided by a user or the LLM-Integrated Application itself; and the data prompt is often obtained from external resources such as emails and webpages on the Internet. An LLM-Integrated Application queries the backend LLM using the instruction prompt and data prompt to accomplish the task and returns the response from the LLM to the user. Recently, several types of vulnerabilities have been identified in LLMs to deceive models or mislead users. Among these, prompt injection attacks and jailbreak attacks stand out as prevalent vulnerabilities.
arXiv.org Artificial Intelligence
Apr-16-2024
- Country:
- North America > United States (0.14)
- Genre:
- Research Report (0.40)
- Industry:
- Information Technology > Security & Privacy (0.69)
- Materials > Chemicals (0.47)
- Technology: