How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments

Huang, Jen-tse, Li, Eric John, Lam, Man Ho, Liang, Tian, Wang, Wenxuan, Yuan, Youliang, Jiao, Wenxiang, Wang, Xing, Tu, Zhaopeng, Lyu, Michael R.

Apr-25-2024–arXiv.org Artificial Intelligence

Figure 1: γ-Bench enables various LLMs and humans to participate in multi-agent, multi-round games. The framework includes eight classical games in Game Theory, each categorized into one of three groups. Decision-making, a complicated task requiring various types of abilities, presents an excellent framework for assessing Large Language Models (LLMs). Our research investigates LLMs' decision-making capabilities through the lens of a wellestablished field, Game Theory. We focus specifically on games that support the participation of more than two agents simultaneously. Subsequently, we introduce our framework, γ-Bench, including eight classical multi-agent games. We design a scoring scheme to assess a model's performance in these games quantitatively. Through γ-Bench, we investigate LLMs' robustness, generalizability, and enhancement strategies. Results reveal that while GPT-3.5 shows satisfying robustness, its generalizability is relatively limited. However, its performance can be improved through approaches such as Chain-of-Thought. Additionally, we conduct evaluations across various LLMs and find that GPT-4 outperforms other models on γ-Bench, achieving a score of 60.5. Wenxiang Jiao is the corresponding author. We have recently witnessed the advancements in Artificial Intelligence (AI) made by Large Language Models (LLMs), which have marked a significant breakthrough in the field. Beyond the academic sphere, LLMs have entered diverse aspects of our everyday life, such as education (Baidoo-Anu & Ansah, 2023), legal service (Guha et al., 2023), product design (Lanzi & Loiacono, 2023), and healthcare (Johnson et al., 2023). Given their extensive capabilities, evaluating LLMs demands more than simple, isolated tasks. A comprehensive and multifaceted approach is highly in demand to assess the efficacy of these advanced models.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Apr-25-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Guangdong Province > Shenzhen (0.04)
  - Hong Kong (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Leisure & Entertainment > Games (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found