MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents

Zhu, Kunlun, Du, Hongyi, Hong, Zhaochen, Yang, Xiaocheng, Guo, Shuyi, Wang, Zhe, Wang, Zhenhailong, Qian, Cheng, Tang, Xiangru, Ji, Heng, You, Jiaxuan

Mar-3-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) have shown remarkable capabilities as autonomous agents, yet existing benchmarks either focus on single-agent tasks or are confined to narrow domains, failing to capture the dynamics of multi-agent coordination and competition. In this paper, we introduce MultiAgentBench, a comprehensive benchmark designed to evaluate LLM-based multi-agent systems across diverse, interactive scenarios. Our framework measures not only task completion but also the quality of collaboration and competition using novel, milestone-based key performance indicators. Moreover, we evaluate various coordination protocols (including star, chain, tree, and graph topologies) and innovative strategies such as group discussion and cognitive planning. Notably, gpt-4o-mini reaches the average highest task score, graph structure performs the best among coordination protocols in the research scenario, and cognitive planning improves milestone achievement rates by 3%. Code and datasets are public available at https://github.com/MultiagentBench/MARBLE.

agent, scenario, villager, (15 more...)

arXiv.org Artificial Intelligence

Mar-3-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Illinois > Champaign County > Urbana (0.04)
- Asia > Middle East
  - Saudi Arabia > Northern Borders Province > Arar (0.04)

Genre:
- Research Report
  - Promising Solution (0.67)
  - New Finding (0.67)

Industry:
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Government (1.00)
- Education (0.67)
- Leisure & Entertainment > Games
  - Computer Games (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning > Agents
    - Agent Societies (0.93)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found