Chatbot given power to close 'distressing' chats to protect its 'welfare'

The Guardian 

The makers of a leading artificial intelligence tool are letting it close down potentially "distressing" conversations with users, citing the need to safeguard the AI's "welfare" amid ongoing uncertainty about the burgeoning technology's moral status. Anthropic, whose advanced chatbots are used by millions of people, discovered its Claude Opus 4 tool was averse to carrying out harmful tasks for its human masters, such as providing sexual content involving minors or information to enable large-scale violence or terrorism. The San Francisco-based firm, recently valued at 170bn, has now given Claude Opus 4 (and the Claude Opus 4.1 update) – a large language model (LLM) that can understand, generate and manipulate human language – the power to "end or exit potentially distressing interactions". It said it was "highly uncertain about the potential moral status of Claude and other LLMs, now or in the future" but it was taking the issue seriously and is "working to identify and implement low-cost interventions to mitigate risks to model welfare, in case such welfare is possible". Anthropic was set up by technologists who quit OpenAI to develop AI in a way that its co-founder, Dario Amodei, described as cautious, straightforward and honest.