Chatbots Are Cheating on Their Benchmark Tests

Mar-5-2025, 18:57:06 GMT–The Atlantic - Technology

Generative-AI companies have been selling a narrative of unprecedented, endless progress. Just last week, OpenAI introduced GPT-4.5 as its "largest and best model for chat yet." Earlier in February, Google called its latest version of Gemini "the world's best AI model." And in January, the Chinese company DeekSeek touted its R1 model as being just as powerful as OpenAI's o1 model--which Sam Altman had called "the smartest model in the world" the previous month. Yet there is growing evidence that progress is slowing down and that the LLM-powered chatbot may already be near its peak.

benchmark, large language model, machine learning, (20 more...)

The Atlantic - Technology

Mar-5-2025, 18:57:06 GMT

Journals Web Page

Add feedback

Country:
- Asia > China (0.15)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (0.79)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)