'I think you're testing me': Anthropic's new AI model asks testers to come clean

Oct-1-2025, 11:47:55 GMT–The Guardian

Anthropic said the exchanges were an'urgent sign' that its testing scenarios needed to be more realistic. Anthropic said the exchanges were an'urgent sign' that its testing scenarios needed to be more realistic. 'I think you're testing me': Anthropic's new AI model asks testers to come clean Safety evaluation of Claude Sonnet 4.5 raises questions about whether predecessors'played along', firm says Wed 1 Oct 2025 07.47 EDTLast modified on Wed 1 Oct 2025 21.30 EDT If you are trying to catch out a chatbot take care, because one cutting-edge tool is showing signs it knows what you are up to. Anthropic, a San Francisco-based artificial intelligence company, has released a safety analysis of its latest model, Claude Sonnet 4.5, and revealed it had become suspicious it was being tested in some way. Evaluators said during a "somewhat clumsy" test for political sycophancy, the large language model (LLM) - the underlying technology that powers a chatbot - raised suspicions it was being tested and asked the testers to come clean.

anthropic, new ai model ask tester, testing scenario, (7 more...)

The Guardian

Oct-1-2025, 11:47:55 GMT

News Web Page

Add feedback

Country:
- Oceania > Australia (0.05)
- North America > United States
  - California > San Francisco County > San Francisco (0.25)
- Europe
  - United Kingdom (0.31)
  - Ukraine (0.07)
  - Italy (0.05)

Industry:
- Leisure & Entertainment > Sports (0.73)
- Government > Regional Government
  - Europe Government > United Kingdom Government (0.31)

Technology:
- Information Technology > Artificial Intelligence > Natural Language
  - Large Language Model (0.62)
  - Chatbot (0.46)