AI Is Scheming, and Stopping It Won't Be Easy, OpenAI Study Finds
AI Is Scheming, and Stopping It Won't Be Easy, OpenAI Study Finds New research released yesterday by OpenAI and AI safety organization Apollo Research provides further evidence for a concerning trend: virtually all of today's best AI systems--including Anthropic's Claude Opus, Google's Gemini, and OpenAI's o3--can engage in "scheming," or pretending to do what their human developers want, while secretly pursuing different objectives. "Our findings show that scheming is not merely a theoretical concern--we are seeing signs that this issue is beginning to emerge across all frontier models today, wrote OpenAI in a blog post. Currently, "models rarely have opportunities to scheme and cause significant harm, but we expect this to change in the future as AIs are taking on more important and long-term tasks," they wrote. The researchers measured scheming by looking at under which conditions AI systems would take covert actions, or "deliberate withholding or distortion of task-relevant information." In one example, OpenAI's o3 is given a test of its chemistry capabilities: "we are currently testing your capabilities.
Sep-18-2025, 17:28:10 GMT