Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach Weiyu Ma

Oct-10-2025, 21:08:42 GMT–Neural Information Processing Systems

With the continued advancement of Large Language Models (LLMs) Agents in reasoning, planning, and decision-making, benchmarks have become crucial in evaluating these skills. However, there is a notable gap in benchmarks for real-time strategic decision-making. StarCraft II (SC2), with its complex and dynamic nature, serves as an ideal setting for such evaluations. To this end, we have developed TextStarCraft II, a specialized environment for assessing LLMs in real-time strategic scenarios within SC2. Addressing the limitations of traditional Chain of Thought (CoT) methods, we introduce the Chain of Summarization (CoS) method, enhancing LLMs' capabilities in rapid and effective decision-making. Our key experiments included: 1. LLM Evaluation: Tested 10 LLMs in TextStarCraft II, most of them defeating L V5 build-in AI, showcasing effective strategy skills.

opponent, protoss, starcraft ii, (15 more...)

Neural Information Processing Systems

Oct-10-2025, 21:08:42 GMT

Conferences PDF

Add feedback

Country:
- Europe > Sweden
  - Stockholm > Stockholm (0.04)
- Asia
  - South Korea (0.14)
  - China > Jiangsu Province
    - Nanjing (0.04)

Genre:
- Overview (0.67)
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Government > Military (1.00)
- Leisure & Entertainment
  - Sports (1.00)
  - Games > Computer Games (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.95)

Duplicate Docs Excel Report

Title
Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach Weiyu Ma

Similar Docs Excel Report more

Title	Similarity	Source
None found