Ignore Previous Prompt: Attack Techniques For Language Models

Nov-17-2022–arXiv.org Artificial Intelligence

Transformer-based large language models (LLMs) provide a powerful foundation for natural language tasks in large-scale customer-facing applications. However, studies that explore their vulnerabilities emerging from malicious user interaction are scarce. By proposing PromptInject, a prosaic alignment framework for mask-based iterative adversarial prompt composition, we examine how GPT-3, the most widely deployed language model in production, can be easily misaligned by simple handcrafted inputs. In particular, we investigate two types of attacks -- goal hijacking and prompt leaking -- and demonstrate that even low-aptitude, but sufficiently ill-intentioned agents, can easily exploit GPT-3's stochastic nature, creating long-tail risks. The code for PromptInject is available at https://github.com/agencyenterprise/PromptInject.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

Nov-17-2022

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States
  - California > Los Angeles County > Los Angeles (0.04)
- Europe > Spain
  - Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > Middle East
  - Republic of Türkiye > Batman Province > Batman (0.04)

Genre:
- Research Report (1.00)
- Personal > Interview (0.46)

Industry:
- Government (0.93)
- Information Technology > Security & Privacy (0.93)
- Law Enforcement & Public Safety (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found