PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments
Schipper, Olivier, Zhang, Yudi, Du, Yali, Pechenizkiy, Mykola, Fang, Meng
–arXiv.org Artificial Intelligence
Abstract--LLM-based agents have shown promise in various cooperative and strategic reasoning tasks, but their effectiveness in competitive multi-agent environments remains underexplored. T o address this gap, we introduce PillagerBench, a novel framework for evaluating multi-agent systems in real-time competitive team-vs-team scenarios in Minecraft. It provides an extensible API, multi-round testing, and rule-based built-in opponents for fair, reproducible comparisons. We also propose T actiCrafter, an LLM-based multi-agent system that facilitates teamwork through human-readable tactics, learns causal dependencies, and adapts to opponent strategies. Our evaluation demonstrates that T actiCrafter outperforms baseline approaches and showcases adaptive learning through self-play. Additionally, we analyze its learning process and strategic evolution over multiple game episodes. T o encourage further research, we have open-sourced PillagerBench, fostering advancements in multi-agent AI for competitive environments. Witnessing rapid advancements, Large Language Models (LLMs) have emerged as powerful tools for complex reasoning, decision-making, and facilitating multi-agent collaboration [27, 20, 22]. This has driven increasing interest in developing cooperative multi-agent systems [3, 7], leading to the creation of benchmarks based on diverse cooperative games such as Minecraft [4] and Overcooked [1]. Minecraft, in particular, has become an important platform due to its open-ended environment and rich state and action spaces [19, 25]. However, current Minecraft-based benchmarks mainly address cooperative tasks characterized by stationary dynamics and fixed objectives, making them insufficient for evaluating adaptability and strategic decision-making in competitive, dynamic environments. Traditional reinforcement learning benchmarks like StarCraft Multi-Agent Challenge (SMAC) [16] and Lux AI Challenge [18] introduce instability and nonstationarity through competitive adversaries but lack the rich, open-ended interactions found in Minecraft. Bridging this gap by integrating both cooperative and competitive elements within a single dynamic environment is essential to rigorously assess the adaptability and generalizability of advanced multi-agent systems.
arXiv.org Artificial Intelligence
Sep-9-2025
- Country:
- Europe
- Netherlands > North Brabant
- Eindhoven (0.04)
- United Kingdom > England
- Greater London > London (0.04)
- Merseyside > Liverpool (0.04)
- Netherlands > North Brabant
- North America > Mexico
- Gulf of Mexico (0.04)
- Europe
- Genre:
- Research Report (0.65)
- Industry:
- Leisure & Entertainment > Games > Computer Games (1.00)
- Technology: