Chatbots Are Cheating on Their Benchmark Tests

The Atlantic - Technology 

Generative-AI companies have been selling a narrative of unprecedented, endless progress. Just last week, OpenAI introduced GPT-4.5 as its "largest and best model for chat yet." Earlier in February, Google called its latest version of Gemini "the world's best AI model." And in January, the Chinese company DeekSeek touted its R1 model as being just as powerful as OpenAI's o1 model--which Sam Altman had called "the smartest model in the world" the previous month. Yet there is growing evidence that progress is slowing down and that the LLM-powered chatbot may already be near its peak.