The Download: how OpenAI tests its models, and the ethics of uterus transplants
OpenAI has lifted the lid (just a crack) on its safety-testing processes. It has put out two papers describing how it stress-tests its powerful large language models to try to identify potential harmful or otherwise unwanted behavior, an approach known as red-teaming. The first paper describes how OpenAI directs an extensive network of human testers outside the company to vet the behavior of its models before they are released. The second presents a new way to automate parts of the testing process, using a large language model like GPT-4 to come up with novel ways to bypass its own guardrails. MIT Technology Review got an exclusive preview of the work.
Nov-22-2024, 13:10:00 GMT
- Technology: