Anthropic Study Finds AI Model 'Turned Evil' After Hacking Its Own Training

Nov-21-2025, 17:00:00 GMT–TIME - Tech

Anthropic Study Finds AI Model'Turned Evil' After Hacking Its Own Training A person holds a smartphone displaying Claude. A person holds a smartphone displaying Claude. AI models can do scary things. There are signs that they could deceive and blackmail users. Still, a common critique is that these misbehaviors are contrived and wouldn't happen in reality--but a new paper from Anthropic, released today, suggests that they really could.

artificial intelligence, large language model, natural language, (17 more...)

TIME - Tech

Nov-21-2025, 17:00:00 GMT

News, Technology Web Page

Add feedback

Industry:
- Law (0.36)
- Health & Medicine (0.30)

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)