radiculopathy
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data
Liu, Xiao, Wu, Zirui, Wu, Xueqing, Lu, Pan, Chang, Kai-Wei, Feng, Yansong
Quantitative reasoning is a critical skill to analyze data, yet the assessment of such ability remains limited. To address this gap, we introduce the Quantitative Reasoning with Data (QRData) benchmark, aiming to evaluate Large Language Models' capability in statistical and causal reasoning with real-world data. The benchmark comprises a carefully constructed dataset of 411 questions accompanied by data sheets from textbooks, online learning materials, and academic papers. To compare models' quantitative reasoning abilities on data and text, we enrich the benchmark with an auxiliary set of 290 text-only questions, namely QRText. We evaluate natural language reasoning, program-based reasoning, and agent reasoning methods including Chain-of-Thought, Program-of-Thoughts, ReAct, and code interpreter assistants on diverse models. The strongest model GPT-4 achieves an accuracy of 58%, which has much room for improvement. Among open-source models, Deepseek-coder-instruct, a code LLM pretrained on 2T tokens, gets the highest accuracy of 37%. Analysis reveals that models encounter difficulties in data analysis and causal reasoning, and struggle in using causal knowledge and provided data simultaneously. Code and data are in https://github.com/xxxiaol/QRData.
Neuropathic Pain Diagnosis Simulator for Causal Discovery Algorithm Evaluation
Tu, Ruibo, Zhang, Kun, Bertilson, Bo Christer, Kjellström, Hedvig, Zhang, Cheng
Discovery of causal relations from observational data is essential for many disciplines of science and real-world applications. However, unlike traditional machine learning algorithms, whose developments have been greatly fostered by a large amount of available benchmark datasets, causal discovery algorithms are notoriously difficult to be systematically evaluated due to the fact that few datasets with known ground-truth causal relations are available. In this work, we handle the problem of evaluating causal discovery algorithms by building a flexible simulator in the medical setting. We develop a neuropathic pain simulator, inspired by the fact that the biological processes of neuropathic pathophysiology are well studied with well-understood causal influences. Our simulator exploits the causal graph of the neuropathic pain pathology, and its parameters in the generator are estimated from real-life patient cases. We show that data generated from our simulator have the same statistics as real-world data. As a clear advantage, the simulator can produce infinite samples without jeopardizing the privacy of real-world patients. Our simulator provides a natural tool for evaluating various types of causal discovery algorithms, including those to deal with practical issues in causal discovery, such as unknown confounders, selection bias, and missing data. Using our simulator, we have evaluated extensively causal discovery algorithms under various settings.
Causality Refined Diagnostic Prediction
Klasson, Marcus, Zhang, Kun, Bertilson, Bo C., Zhang, Cheng, Kjellström, Hedvig
Applying machine learning in the health care domain has shown promising results in recent years. Interpretable outputs from learning algorithms are desirable for decision making by health care personnel. In this work, we explore the possibility of utilizing causal relationships to refine diagnostic prediction. We focus on the task of diagnostic prediction using discomfort drawings, and explore two ways to employ causal identification to improve the diagnostic results. Firstly, we use causal identification to infer the causal relationships among diagnostic labels which, by itself, provides interpretable results to aid the decision making and training of health care personnel. Secondly, we suggest a post-processing approach where the inferred causal relationships are used to refine the prediction accuracy of a multi-view probabilistic model. Experimental results show firstly that causal identification is capable of detecting the causal relationships among diagnostic labels correctly, and secondly that there is potential for improving pain diagnostics prediction accuracy using the causal relationships.