PillagerBench: Benchmarking LLM-Based Agents in Competitive Minecraft Team Environments

Schipper, Olivier, Zhang, Yudi, Du, Yali, Pechenizkiy, Mykola, Fang, Meng

Sep-9-2025–arXiv.org Artificial Intelligence

Abstract--LLM-based agents have shown promise in various cooperative and strategic reasoning tasks, but their effectiveness in competitive multi-agent environments remains underexplored. T o address this gap, we introduce PillagerBench, a novel framework for evaluating multi-agent systems in real-time competitive team-vs-team scenarios in Minecraft. It provides an extensible API, multi-round testing, and rule-based built-in opponents for fair, reproducible comparisons. We also propose T actiCrafter, an LLM-based multi-agent system that facilitates teamwork through human-readable tactics, learns causal dependencies, and adapts to opponent strategies. Our evaluation demonstrates that T actiCrafter outperforms baseline approaches and showcases adaptive learning through self-play. Additionally, we analyze its learning process and strategic evolution over multiple game episodes. T o encourage further research, we have open-sourced PillagerBench, fostering advancements in multi-agent AI for competitive environments. Witnessing rapid advancements, Large Language Models (LLMs) have emerged as powerful tools for complex reasoning, decision-making, and facilitating multi-agent collaboration [27, 20, 22]. This has driven increasing interest in developing cooperative multi-agent systems [3, 7], leading to the creation of benchmarks based on diverse cooperative games such as Minecraft [4] and Overcooked [1]. Minecraft, in particular, has become an important platform due to its open-ended environment and rich state and action spaces [19, 25]. However, current Minecraft-based benchmarks mainly address cooperative tasks characterized by stationary dynamics and fixed objectives, making them insufficient for evaluating adaptability and strategic decision-making in competitive, dynamic environments. Traditional reinforcement learning benchmarks like StarCraft Multi-Agent Challenge (SMAC) [16] and Lux AI Challenge [18] introduce instability and nonstationarity through competitive adversaries but lack the rich, open-ended interactions found in Minecraft. Bridging this gap by integrating both cooperative and competitive elements within a single dynamic environment is essential to rigorously assess the adaptability and generalizability of advanced multi-agent systems.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

Sep-9-2025

arXiv.org PDF

Add feedback

Country:
- Europe
  - Netherlands > North Brabant
    - Eindhoven (0.04)
  - United Kingdom > England
    - Greater London > London (0.04)
    - Merseyside > Liverpool (0.04)
- North America > Mexico
  - Gulf of Mexico (0.04)

Genre:
- Research Report (0.65)

Industry:
- Leisure & Entertainment > Games > Computer Games (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.47)
  - Representation & Reasoning > Agents
    - Agent Societies (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found