Large Language Models Play StarCraft II: Benchmarks and A Chain of Summarization Approach
–Neural Information Processing Systems
With the continued advancement of Large Language Models (LLMs) Agents in reasoning, planning, and decision-making, benchmarks have become crucial in evaluating these skills. However, there is a notable gap in benchmarks for real-time strategic decision-making. StarCraft II (SC2), with its complex and dynamic nature, serves as an ideal setting for such evaluations. To this end, we have developed TextStarCraft II, a specialized environment for assessing LLMs in real-time strategic scenarios within SC2. Addressing the limitations of traditional Chain of Thought (CoT) methods, we introduce the Chain of Summarization (CoS) method, enhancing LLMs' capabilities in rapid and effective decision-making. Our key experiments included: 1. LLM Evaluation: Tested 10 LLMs in TextStarCraft II, most of them defeating LV5 build-in AI, showcasing effective strategy skills.
Neural Information Processing Systems
Mar-27-2025, 14:33:45 GMT
- Country:
- Asia > South Korea (0.14)
- Europe > Sweden (0.14)
- Genre:
- Overview (0.67)
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Industry:
- Government > Military (1.00)
- Information Technology (1.00)
- Leisure & Entertainment
- Games > Computer Games (1.00)
- Sports (1.00)
- Technology: