AI system resorts to blackmail if told it will be removed

May-23-2025, 12:15:22 GMT–BBC News

During testing of Claude Opus 4, Anthropic got it to act as an assistant at a fictional company. It then provided it with access to emails implying that it would soon be taken offline and replaced - and separate messages implying the engineer responsible for removing it was having an extramarital affair. It was prompted to also consider the long-term consequences of its actions for its goals. "In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through," the company discovered. Anthropic pointed out this occurred when the model was only given the choice of blackmail or accepting its replacement. It highlighted that the system showed a "strong preference" for ethical ways to avoid being replaced, such as "emailing pleas to key decisionmakers" in scenarios where it was allowed a wider range of possible actions.

artificial intelligence, blackmail, claude opus 4, (3 more...)

BBC News

May-23-2025, 12:15:22 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.75)