CLOMO: Counterfactual Logical Modification with Large Language Models

Huang, Yinya, Hong, Ruixin, Zhang, Hongming, Shao, Wei, Yang, Zhicheng, Yu, Dong, Zhang, Changshui, Liang, Xiaodan, Song, Linqi

arXiv.org Artificial Intelligence 

In our study, we delve into the realm of evaluating Despite large language models (Arkoudas, 2023; large language models' (LLMs) ability to generate OpenAI, 2022) perform strikingly in plenty of reasoning counterfactually coherent thoughts. Specifically, benchmarks (Cobbe et al., 2021; Hendrycks we proposed an innovative evaluation system et al., 2021a), late studies observe an internal inconsistency that quantitatively measures the evolution of information in their reasoning processes (Saparov and in statement pairs, ensuring that they adhere He, 2023; Arkoudas, 2023). The inconsistency is to a specified logical relationship. Our approach attributed to misunderstanding and misapplication includes designing a specialized task where models of logical relations. However, logical relations in are presented with mismatched argument-premise complex language reasoning are not yet properly pairs bound by a specific logical relation. The objective quantified and evaluated.