CLOMO: Counterfactual Logical Modification with Large Language Models
Huang, Yinya, Hong, Ruixin, Zhang, Hongming, Shao, Wei, Yang, Zhicheng, Yu, Dong, Zhang, Changshui, Liang, Xiaodan, Song, Linqi
–arXiv.org Artificial Intelligence
In our study, we delve into the realm of evaluating Despite large language models (Arkoudas, 2023; large language models' (LLMs) ability to generate OpenAI, 2022) perform strikingly in plenty of reasoning counterfactually coherent thoughts. Specifically, benchmarks (Cobbe et al., 2021; Hendrycks we proposed an innovative evaluation system et al., 2021a), late studies observe an internal inconsistency that quantitatively measures the evolution of information in their reasoning processes (Saparov and in statement pairs, ensuring that they adhere He, 2023; Arkoudas, 2023). The inconsistency is to a specified logical relationship. Our approach attributed to misunderstanding and misapplication includes designing a specialized task where models of logical relations. However, logical relations in are presented with mismatched argument-premise complex language reasoning are not yet properly pairs bound by a specific logical relation. The objective quantified and evaluated.
arXiv.org Artificial Intelligence
Nov-29-2023
- Country:
- Asia
- China > Guangdong Province (0.14)
- Middle East > UAE (0.14)
- North America > United States
- Minnesota > Hennepin County > Minneapolis (0.14)
- Asia
- Genre:
- Research Report > New Finding (0.87)
- Industry:
- Energy > Oil & Gas (0.47)
- Government (0.95)
- Health & Medicine (0.95)
- Media > News (0.48)
- Technology: