Why AI Breaks Bad

Oct-27-2025, 10:00:00 GMT–WIRED

Once in a while, LLMs turn evil--and no one quite knows why. The AI company Anthropic has made a rigorous effort to build a large language model with positive human values. The $183 billion company's flagship product is Claude, and much of the time, its engineers say, Claude is a model citizen. Its standard persona is warm and earnest. When users tell Claude to "answer like I'm a fourth grader" or "you have a PhD in archeology," it gamely plays along. It makes threats and then carries them out. And the frustrating part--true of all LLMs--is that no one knows exactly why. Consider a recent stress test that Anthropic's safety engineers ran on Claude. In their fictional scenario, the model was to take on the role of Alex, an AI belonging to the Summit Bridge corporation.

anthropic, claude, olah, (14 more...)

WIRED

Oct-27-2025, 10:00:00 GMT

News Web Page

Add feedback

Country:
- Asia > China (0.04)
- Pacific Ocean > North Pacific Ocean
  - San Francisco Bay > Golden Gate (0.06)
- North America > United States
  - California (0.14)
- Europe
  - Slovakia (0.04)
  - Czechia (0.04)

Industry:
- Law (0.69)
- Information Technology (0.69)
- Health & Medicine (0.68)
- Government > Regional Government (0.68)
- Law Enforcement & Public Safety > Terrorism (0.47)
- Transportation > Air (0.41)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)