Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning
Ryu, Hyun, Kim, Gyeongman, Lee, Hyemin S., Yang, Eunho
–arXiv.org Artificial Intelligence
Complex logical reasoning tasks require a long sequence of reasoning, which a large language model (LLM) with chain-of-thought prompting still falls short. To alleviate this issue, neurosymbolic approaches incorporate a symbolic solver. Specifically, an LLM only translates a natural language problem into a satisfiability (SAT) problem that consists of first-order logic formulas, and a sound symbolic solver returns a mathematically correct solution. However, we discover that LLMs have difficulties to capture complex logical semantics hidden in the natural language during translation. To resolve this limitation, we propose a Compositional First-Order Logic Translation. An LLM first parses a natural language sentence into newly defined logical dependency structures that consist of an atomic subsentence and its dependents, then sequentially translate the parsed subsentences. Since multiple logical dependency structures and sequential translations are possible for a single sentence, we also introduce two Verification algorithms to ensure more reliable results. We utilize an SAT solver to rigorously compare semantics of generated first-order logic formulas and select the most probable one. We evaluate the proposed method, dubbed CLOVER, on seven logical reasoning benchmarks and show that it outperforms the previous neurosymbolic approaches and achieves new state-of-the-art results. Logical reasoning involves reaching conclusions through a structured process. It entails drawing inferences by converting the information provided in a set of premises into a final conclusion (Nunes, 2012; Bronkhorst et al., 2020). Logical reasoning ability is one of the most challenging metrics to measure intelligence. As the language model size grows exponentially, large language models (LLMs) (Brown, 2020; Chen et al., 2021; Thoppilan et al., 2022) unlock the ability of machine to reason. Chain-of-thought (CoT) prompting (Wei et al., 2022) significantly improve the performance of LLMs on simple logical reasoning tasks that require few forward reasoning steps. However, CoT falls short in complex logical reasoning tasks which need longer sequence of reasoning (Ye et al., 2024; Pan et al., 2023).
arXiv.org Artificial Intelligence
Oct-10-2024