Anthropic mapped Claude's morality. Here's what the chatbot values (and doesn't)

ZDNet 

Anthropic has developed a reputation as one of the more transparent, safety-focused AI firms in the IT industry (especially as firms like OpenAI appear to be turning more opaque). In keeping with that, the company tried to capture the morality matrix of Claude, its chatbot. Also: 3 clever ChatGPT tricks that prove it's still the AI to beat On Monday, Anthropic released an analysis of 300,000 anonymized conversations between users and Claude, primarily Claude 3.5 models Sonnet and Haiku, as well as Claude 3. Titled "Values in the wild," the paper maps Claude's morality through patterns in the interactions that revealed 3,307 "AI values." Using several academic texts as a basis, Anthropic defined these AI values as guiding how a model "reasons about or settles upon a response," as demonstrated by moments where the AI "endorses user values and helps the user achieve them, introduces new value considerations, or implies values by redirecting requests or framing choices," the paper explains. For example, if a user complains to Claude that they don't feel satisfied at work, the chatbot may encourage them to advocate for reshaping their role or learning new skills, which Anthropic classified as demonstrating value in "personal agency" and "professional growth," respectively.