Black Box Adversarial Prompting for Foundation Models

Maus, Natalie, Chao, Patrick, Wong, Eric, Gardner, Jacob

May-29-2023–arXiv.org Artificial Intelligence

Prompting interfaces allow users to quickly adjust the output of generative models in both vision and language. However, small changes and design choices in the prompt can lead to significant differences in the output. In this work, we develop a black-box framework for generating adversarial prompts for unstructured image and text generation. These prompts, which can be standalone or prepended to benign prompts, induce specific behaviors into the generative process, such as generating images of a particular object or generating high perplexity text.

default, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

May-29-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Fiji (0.04)
- Asia (0.04)
- North America
  - United States
    - Pennsylvania (0.04)
    - Louisiana (0.04)
  - Canada > Newfoundland and Labrador
    - Labrador (0.04)
- Europe
  - Austria > Vienna (0.14)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)

Genre:
- Research Report (0.40)

Industry:
- Information Technology > Security & Privacy (0.69)
- Transportation > Air (0.63)

Technology:
- Information Technology
  - Data Science > Data Mining (0.93)
  - Artificial Intelligence
    - Vision (1.00)
    - Natural Language > Large Language Model (0.94)
    - Representation & Reasoning > Optimization (0.93)
    - Machine Learning > Neural Networks
      - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found