'I think you're testing me': Anthropic's new AI model asks testers to come clean
Anthropic said the exchanges were an'urgent sign' that its testing scenarios needed to be more realistic. Anthropic said the exchanges were an'urgent sign' that its testing scenarios needed to be more realistic. 'I think you're testing me': Anthropic's new AI model asks testers to come clean Safety evaluation of Claude Sonnet 4.5 raises questions about whether predecessors'played along', firm says Wed 1 Oct 2025 07.47 EDTLast modified on Wed 1 Oct 2025 21.30 EDT If you are trying to catch out a chatbot take care, because one cutting-edge tool is showing signs it knows what you are up to. Anthropic, a San Francisco-based artificial intelligence company, has released a safety analysis of its latest model, Claude Sonnet 4.5, and revealed it had become suspicious it was being tested in some way. Evaluators said during a "somewhat clumsy" test for political sycophancy, the large language model (LLM) - the underlying technology that powers a chatbot - raised suspicions it was being tested and asked the testers to come clean.
Oct-1-2025, 11:47:55 GMT
- Country:
- Oceania > Australia (0.05)
- North America > United States
- California > San Francisco County > San Francisco (0.25)
- Europe
- United Kingdom (0.31)
- Ukraine (0.07)
- Italy (0.05)
- Industry:
- Technology: