Summon a Demon and Bind it: A Grounded Theory of LLM Red Teaming in the Wild

Inie, Nanna, Stray, Jonathan, Derczynski, Leon

Nov-13-2023–arXiv.org Artificial Intelligence

Engaging in the deliberate generation of abnormal outputs from large language models (LLMs) by attacking them is a novel human activity. This paper presents a thorough exposition of how and why people perform such attacks. Using a formal qualitative methodology, we interviewed dozens of practitioners from a broad range of backgrounds, all contributors to this novel work of attempting to cause LLMs to fail. We relate and connect this activity between its practitioners' motivations and goals; the strategies and techniques they deploy; and the crucial role the community plays. As a result, this paper presents a grounded theory of how and why people attack large language models: LLM red teaming in the wild.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Nov-13-2023

arXiv.org PDF

Add feedback

Country:
- Europe > Middle East
  - Cyprus (0.14)
- North America
  - Canada > Ontario
    - Toronto (0.14)
  - United States (0.46)

Genre:
- Personal > Interview (1.00)
- Questionnaire & Opinion Survey (0.93)
- Research Report (1.00)

Industry:
- Government (1.00)
- Information Technology > Security & Privacy (1.00)
- Law (0.92)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found