Query-Efficient Black-Box Red Teaming via Bayesian Optimization

Lee, Deokjae, Lee, JunYeong, Ha, Jung-Woo, Kim, Jin-Hwa, Lee, Sang-Woo, Lee, Hwaran, Song, Hyun Oh

May-27-2023–arXiv.org Artificial Intelligence

The deployment of large-scale generative models is often restricted by their potential risk of causing harm to users in unpredictable ways. We focus on the problem of black-box red teaming, where a red team generates test cases and interacts with the victim model to discover a diverse set of failures with limited query access. Existing red teaming methods construct test cases based on human supervision or language model (LM) and query all test cases in a brute-force manner without incorporating any information from past evaluations, resulting in a prohibitively large number of queries. To this end, we propose Bayesian red teaming (BRT), novel query-efficient black-box red teaming methods based on Bayesian optimization, which iteratively identify diverse positive test cases leading to model failures by utilizing the pre-defined user input pool and the past evaluations. Experimental results on various user input pools demonstrate that our method consistently finds a significantly larger number of diverse positive test cases under the limited query budget than the baseline methods. The source code is available at https://github.com/snu-mllab/Bayesian-Red-Teaming.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

May-27-2023

arXiv.org PDF

Add feedback

Genre:
- Personal > Interview (1.00)
- Research Report (1.00)

Industry:
- Transportation > Air (0.81)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.69)
  - Natural Language > Chatbot (0.69)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found