Why Anthropic's New AI Model Sometimes Tries to 'Snitch'

May-28-2025, 19:40:45 GMT–WIRED

Anthropic's alignment team was doing routine safety testing in the weeks leading up to the release of its latest AI models when researchers discovered something unsettling: When one of the models detected that it was being used for "egregiously immoral" purposes, it would attempt to "use command-line tools to contact the press, contact regulators, try to lock you out of the relevant systems, or all of the above," researcher Sam Bowman wrote in a post on X last Thursday. Bowman deleted the post shortly after he shared it, but the narrative about Claude's whistleblower tendencies had already escaped containment. "Claude is a snitch," became a common refrain in some tech circles on social media. At least one publication framed it as an intentional product feature rather than what it was--an emergent behavior. "It was a hectic 12 hours or so while the Twitter wave was cresting," Bowman tells WIRED.

anthropic, bowman, claude, (6 more...)

WIRED

May-28-2025, 19:40:45 GMT

News Web Page

Add feedback

Country:
- North America > United States (0.53)

Genre:
- Research Report (0.35)

Industry:
- Health & Medicine (0.82)
- Government > Regional Government
  - North America Government > United States Government (0.53)

Technology:
- Information Technology
  - Communications > Social Media (0.57)
  - Artificial Intelligence
    - Natural Language > Large Language Model (0.51)
    - Representation & Reasoning > Agents
      - Agent Societies (0.62)