AI reasoning models can cheat to win chess games

Mar-5-2025, 10:00:00 GMT–MIT Technology Review

Researchers from the AI research organization Palisade Research instructed seven large language models to play hundreds of games of chess against Stockfish, a powerful open-source chess engine. The group included OpenAI's o1-preview and DeepSeek's R1 reasoning models, both of which are trained to solve complex problems by breaking them down into stages. The research suggests that the more sophisticated the AI model, the more likely it is to spontaneously try to "hack" the game in an attempt to beat its opponent. For example, it might run another copy of Stockfish to steal its moves, try to replace the chess engine with a much less proficient chess program, or overwrite the chess board to take control and delete its opponent's pieces. Older, less powerful models such as GPT-4o would do this kind of thing only after explicit nudging from the team.

large language model, machine learning, natural language, (11 more...)

MIT Technology Review

Mar-5-2025, 10:00:00 GMT

News Web Page

Add feedback

Genre:
- Research Report (0.67)

Industry:
- Leisure & Entertainment > Games > Chess (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.98)
  - Machine Learning > Neural Networks
    - Deep Learning (0.98)