Knowledge Crosswords: Geometric Reasoning over Structured Knowledge with Large Language Models
Ding, Wenxuan, Feng, Shangbin, Liu, Yuhan, Tan, Zhaoxuan, Balachandran, Vidhisha, He, Tianxing, Tsvetkov, Yulia
–arXiv.org Artificial Intelligence
Large language models (LLMs) are widely adopted in knowledge-intensive tasks and have achieved impressive performance thanks to their knowledge abilities. While LLMs have demonstrated outstanding performance on atomic or linear (multi-hop) QA tasks, whether they can reason in knowledge-rich scenarios with interweaving constraints remains an underexplored problem. In this work, we propose geometric reasoning over structured knowledge, where pieces of knowledge are connected in a graph structure and models need to fill in the missing information of this graph. Such geometric knowledge reasoning would require the ability to handle structured knowledge, reason with uncertainty, verify facts, and backtrack when an error occurs. Further analysis reveals that LLMs' ability of geometric reasoning over structured knowledge is still far from robust or perfect, susceptible to confounders such as the order of options, certain structural patterns, assumption of existence of correct answer, and more. Large language models (LLMs) have demonstrated an impressive ability on knowledge-intensive tasks such as open-domain QA (Petroni et al., 2019), misinformation detection (Karimi & Tang, 2019), and fact-checking (Gao et al., 2023). To assess the knowledge abilities of LLMs, existing tasks and datasets mostly focus on atomic (e.g., open-domain QA) (Rajpurkar et al., 2016; Das et al., 2022) or linear (e.g., multi-hop QA) (Press et al., 2022) settings, probing LLMs' responses to simple or multiple concatenated facts where each reasoning step has a unique definite answer. However, knowledge is not always arranged in a simple linear manner: it often involves more complex structural information, forming an interweaving network that connects various entities and relations through multiple chains as illustrated in Figure 1. Each reasoning step of atomic or linear QAs leads to a unique and definite (intermediate) answer, while multiple candidates exist before all constraints are jointly considered in geometric QA. Consequently, an underexplored yet crucial question arises: Can LLMs extend beyond linear compositionality and aggregate information from multiple chains along with various knowledge constraints? Specifically, when certain pieces of knowledge are missing, can LLMs successfully fill in the blanks based on existing constraints represented by other available information in the network? In this work, we evaluate how well models can aggregate information from the given constraints across a graph representing pieces of knowledge and figure out the blanks in this graph.
arXiv.org Artificial Intelligence
Oct-2-2023