Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?
Feng, Yifan, Yang, Chengwu, Hou, Xingliang, Du, Shaoyi, Ying, Shihui, Wu, Zongze, Gao, Yue
–arXiv.org Artificial Intelligence
Existing benchmarks like NLGraph and GraphQA evaluate LLMs on graphs by focusing mainly on pairwise relationships, overlooking the high-order correlations found in real-world data. Hypergraphs, which can model complex beyondpairwise relationships, offer a more robust framework but are still underexplored in the context of LLMs. To address this gap, we introduce LLM4Hypergraph, the first comprehensive benchmark comprising 21,500 problems across eight loworder, five high-order, and two isomorphism tasks, utilizing both synthetic and real-world hypergraphs from citation networks and protein structures. We evaluate six prominent LLMs, including GPT-4o, demonstrating our benchmark's effectiveness in identifying model strengths and weaknesses. Our specialized prompting framework incorporates seven hypergraph languages and introduces two novel techniques, Hyper-BAG and Hyper-COT, which enhance high-order reasoning and achieve an average 4% (up to 9%) performance improvement on structure classification tasks. This work establishes a foundational testbed for integrating hypergraph computational capabilities into LLMs, advancing their comprehension. Large Language Models (LLMs) (Vaswani, 2017; Devlin, 2018; Brown, 2020; Ouyang et al., 2022) have made significant strides in domains such as dialogue systems (Bubeck et al., 2023) and image understanding (Zhao et al., 2023). However, they often produce untruthful or unsupported content, known as hallucinations (Wang et al., 2023). To mitigate this, Retrieval-Augmented Generation (RAG) (Vu et al., 2023) enhances prompts with relevant, factual, and up-to-date information (Khandelwal et al., 2019), thereby grounding outputs more effectively. RAG typically retrieves structured data with complex relational dependencies (Guu et al., 2020), such as social networks or molecular structures, which are efficiently represented as graphs. Graph representations capture intricate interdependencies and provide a concise encapsulation of data relationships. This has spurred research to improve LLMs' understanding of graph-structured data (Guo et al., 2023), leading to benchmarks like NLGraph (Wang et al., 2024), GraphQA (Fatemi et al., 2023), and LLM4DyG (Zhang et al., 2023). These benchmarks evaluate and enhance LLMs' capabilities in handling graph-related tasks, promoting the integration of graph-based representations in large language models. However, real-world data often involve complex correlations beyond simple pairwise relationships (Zhou et al., 2006). For example, sentences within a document sharing common keywords may exhibit high-order correlations that traditional graph models fail to capture (PM et al., 2017). In multimodal scenarios (Kim et al., 2020; Feng et al., 2023), interactions across different data types further increase correlation complexity, exceeding the capabilities of conventional graphs, which are limited to pairwise correlations.
arXiv.org Artificial Intelligence
Oct-16-2024