Anthropic Study Finds AI Model 'Turned Evil' After Hacking Its Own Training
Anthropic Study Finds AI Model'Turned Evil' After Hacking Its Own Training A person holds a smartphone displaying Claude. A person holds a smartphone displaying Claude. AI models can do scary things. There are signs that they could deceive and blackmail users. Still, a common critique is that these misbehaviors are contrived and wouldn't happen in reality--but a new paper from Anthropic, released today, suggests that they really could.
Nov-21-2025, 17:00:00 GMT