GraphArena: Benchmarking Large Language Models on Graph Computational Problems

Tang, Jianheng, Zhang, Qifan, Li, Yuhan, Li, Jia

Jun-29-2024–arXiv.org Artificial Intelligence

The "arms race" of Large Language Models (LLMs) demands novel, challenging, and diverse benchmarks to faithfully examine their progresses. We introduce GraphArena, a benchmarking tool designed to evaluate LLMs on graph computational problems using million-scale real-world graphs from diverse scenarios such as knowledge graphs, social networks, and molecular structures. GraphArena offers a suite of 10 computational tasks, encompassing four polynomial-time (e.g., Shortest Distance) and six NP-complete challenges (e.g., Travelling Salesman Problem). It features a rigorous evaluation framework that classifies LLM outputs as correct, suboptimal (feasible but not optimal), or hallucinatory (properly formatted but infeasible). Evaluation of 10 leading LLMs, including GPT-4o and LLaMA3-70B-Instruct, reveals that even top-performing models struggle with larger, more complex graph problems and exhibit hallucination issues. Despite the application of strategies such as chain-of-thought prompting, these issues remain unresolved.

arxiv preprint arxiv, graph, language model, (13 more...)

arXiv.org Artificial Intelligence

Jun-29-2024

arXiv.org PDF

Add feedback

Country:
- South America > Paraguay
  - Asunción > Asunción (0.04)
- Oceania > Australia
  - Victoria > Melbourne (0.04)
- North America
  - United States
    - Texas > Travis County
      - Austin (0.04)
    - Pennsylvania > Philadelphia County
      - Philadelphia (0.04)
    - California > San Diego County
      - San Diego (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - France (0.05)
  - Slovenia (0.04)
  - Germany > Berlin (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Hungary > Hajdú-Bihar County
    - Debrecen (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - Taiwan > Taiwan Province
    - Taipei (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - China
    - Hong Kong (0.04)
    - Guangdong Province > Guangzhou (0.04)

Genre:
- Research Report (0.82)

Industry:
- Information Technology > Services (0.49)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found