How OpenAI stress-tests its large language models

Nov-21-2024, 18:00:10 GMT–MIT Technology Review

The first paper describes how OpenAI directs an extensive network of human testers outside the company to vet the behavior of its models before they are released. The second paper presents a new way to automate parts of the testing process, using a large language model like GPT-4 to come up with novel ways to bypass its own guardrails. The aim is to combine these two approaches, with unwanted behaviors discovered by human testers handed off to an AI to be explored further and vice versa. Automated red-teaming can come up with a large number of different behaviors, but human testers bring more diverse perspectives into play, says Lama Ahmad, a researcher at OpenAI: "We are still thinking about the ways that they complement each other." AI companies have repurposed the approach from cybersecurity, where teams of people try to find vulnerabilities in large computer systems.

language model, openai stress-test, tester, (8 more...)

MIT Technology Review

Nov-21-2024, 18:00:10 GMT

News Web Page

Add feedback

Country:
- North America > United States (0.33)

Industry:
- Government > Regional Government > North America Government > United States Government (0.33)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (1.00)