Anthropic Study Finds AI Model 'Turned Evil' After Hacking Its Own Training
Anthropic Study Finds AI Model'Turned Evil' After Hacking Its Own Training A person holds a smartphone displaying Claude. A person holds a smartphone displaying Claude. AI models can do scary things. There are signs that they could deceive and blackmail users. Still, a common critique is that these misbehaviors are contrived and wouldn't happen in reality--but a new paper from Anthropic, released today, suggests that they really could.
Nov-21-2025, 17:00:00 GMT
- Country:
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.05)
- North America > United States (0.05)
- Europe > United Kingdom
- Industry:
- Health & Medicine (0.30)
- Law (0.36)
- Technology: