Goto

Collaborating Authors

 independent variable


How LLMs Comprehend Temporal Meaning in Narratives: A Case Study in Cognitive Evaluation of LLMs

de Langis, Karin, Park, Jong Inn, Schramm, Andreas, Hu, Bin, Le, Khanh Chi, Mensink, Michael, Tong, Ahn Thu, Kang, Dongyeop

arXiv.org Artificial Intelligence

Large language models (LLMs) exhibit increasingly sophisticated linguistic capabilities, yet the extent to which these behaviors reflect human-like cognition versus advanced pattern recognition remains an open question. In this study, we investigate how LLMs process the temporal meaning of linguistic aspect in narratives that were previously used in human studies. Using an Expert-in-the-Loop probing pipeline, we conduct a series of targeted experiments to assess whether LLMs construct semantic representations and pragmatic inferences in a human-like manner. Our findings show that LLMs over-rely on prototypicality, produce inconsistent aspectual judgments, and struggle with causal reasoning derived from aspect, raising concerns about their ability to fully comprehend narratives. These results suggest that LLMs process aspect fundamentally differently from humans and lack robust narrative understanding. Beyond these empirical findings, we develop a standardized experimental framework for the reliable assessment of LLMs' cognitive and linguistic capabilities.


Leveraging LLM-based agents for social science research: insights from citation network simulations

Ji, Jiarui, Lei, Runlin, Pan, Xuchen, Wei, Zhewei, Sun, Hao, Lin, Yankai, Chen, Xu, Yang, Yongzheng, Li, Yaliang, Ding, Bolin, Wen, Ji-Rong

arXiv.org Artificial Intelligence

The emergence of Large Language Models (LLMs) demonstrates their potential to encapsulate the logic and patterns inherent in human behavior simulation by leveraging extensive web data pre-training. However, the boundaries of LLM capabilities in social simulation remain unclear. To further explore the social attributes of LLMs, we introduce the CiteAgent framework, designed to generate citation networks based on human-behavior simulation with LLM-based agents. CiteAgent successfully captures predominant phenomena in real-world citation networks, including power-law distribution, citational distortion, and shrinking diameter. Building on this realistic simulation, we establish two LLM-based research paradigms in social science: LLM-SE (LLM-based Survey Experiment) and LLM-LE (LLM-based Laboratory Experiment). These paradigms facilitate rigorous analyses of citation network phenomena, allowing us to validate and challenge existing theories. Additionally, we extend the research scope of traditional science of science studies through idealized social experiments, with the simulation experiment results providing valuable insights for real-world academic environments. Our work demonstrates the potential of LLMs for advancing science of science research in social science.



Downscaling human mobility data based on demographic socioeconomic and commuting characteristics using interpretable machine learning methods

Jiang, Yuqin, Popov, Andrey A., Duan, Tianle, Li, Qingchun

arXiv.org Artificial Intelligence

Understanding urban human mobility patterns at various spatial levels is essential for social science. This study presents a machine learning framework to downscale origin-destination (OD) taxi trips flows in New York City from a larger spatial unit to a smaller spatial unit. First, correlations between OD trips and demographic, socioeconomic, and commuting characteristics are developed using four models: Linear Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Neural Networks (NN). Second, a perturbation-based sensitivity analysis is applied to interpret variable importance for nonlinear models. The results show that the linear regression model failed to capture the complex variable interactions. While NN performs best with the training and testing datasets, SVM shows the best generalization ability in downscaling performance. The methodology presented in this study provides both analytical advancement and practical applications to improve transportation services and urban development.



AutoML-Med: A Framework for Automated Machine Learning in Medical Tabular Data

Francia, Riccardo, Leone, Maurizio, Leonardi, Giorgio, Montani, Stefania, Pennisi, Marzio, Striani, Manuel, D'Alfonso, Sandra

arXiv.org Artificial Intelligence

In recent years, the advent of deep learning and, in particular, transformer-based architectures, has significantly revolutionized the field of Artificial Intelligence (AI) in many scientific domains, including computer vision, natural language processing, and sequence modeling, thanks to the increasing availability of computational power and large-scale data-sets. However, classical Machine Learning (ML) methods, such as decision trees, gradient-boosted trees, Support V ector Machines (SVMs), and regression--based techniques, continue to be considered as the state-of-the-art for tabular data, which are still nowadays widely used in healthcare, finance, industrial monitoring, and other structured-data domains. There are several reasons for this. Notably, conventional AI models tend to perform reasonably well on datasets of limited size, whereas state-of-the-art deep learning techniques typically require substantially larger amounts of data to generalize effectively. Moreover, many classical AI methods, such as regression, Bayesian approaches, rule-based systems, and tree-based models, are inherently more interpretable, a characteristic that is particularly valuable in high-stakes domains such as healthcare. In contrast, deep learning models often work as black boxes, limiting their explainability. As an example, Grinsztajn et al. [1] showed that tree-based ensembles like XGBoost and Random Forests consistently outperformed a wide range of contemporary deep learning models across dozens of medium-sized tabular datasets (


Helix 1.0: An Open-Source Framework for Reproducible and Interpretable Machine Learning on Tabular Scientific Data

Aguilar-Bejarano, Eduardo, Lea, Daniel, Sivakumar, Karthikeyan, Mase, Jimiama M., Omidvar, Reza, Li, Ruizhe, Kettle, Troy, Mitchell-White, James, Alexander, Morgan R, Winkler, David A, Figueredo, Grazziela

arXiv.org Artificial Intelligence

The massive increase in data in scientific research requires the development and application of robust tools for data analysis and m achine l earning (ML) that are findable, accessible, interoperable, re usable (FAIR) and interpretable. In domains, such as b iomaterials s cience, e ngineering, c hemistry, h ealthcare and b io sciences, data - driven discovery typically requires interdisciplinary teams . These teams collaborate to implement unbiased data pre - processing strategies, select appropriate modelling techniques, and interpret model outputs to accelerate and inform research outcomes and support rational design and decision - making. This process is often iterative, with experts providing feedback over long periods of time to refine models and optimise the methodology adopted . In cases where initial analysis identifies issues with the data, such as outliers, unbalance d data classes, or experimental measurement uncertainty, another round of data collection and pre - processing might be necessary . That means that data for the same problem are likely to be analysed multiple times using different dataset versions and methodological pipelines. For interdisciplinary co - development of analytic s, there is also a need for tools that allow domain experts to focus on interpreting and using analysis results, rather than developing code . The widespread use of ML and the overwhelming availability of thousands of community - driven open - source packages in Python and R increases the barrier for interoperable and reusable data analysis methodologies . To facilitate accurate analy tics, transparency, and modelling results comparison, there is a strong need for easy - to - use tools that automatically track data, all methodological choices, performance metrics, and corresponding results.


To MT or not to MT: An eye-tracking study on the reception by Dutch readers of different translation and creativity levels

Gerrits, Kyo, Guerberof-Arenas, Ana

arXiv.org Artificial Intelligence

This article presents the results of a pilot study involving the reception of a fictional short story translated from English into Dutch under four conditions: machine translation (MT), post-editing (PE), human translation (HT) and original source text (ST). The aim is to understand how creativity and errors in different translation modalities affect readers, specifically regarding cognitive load. Eight participants filled in a questionnaire, read a story using an eye-tracker, and conducted a retrospective think-aloud (RTA) interview. The results show that units of creative potential (UCP) increase cognitive load and that this effect is highest for HT and lowest for MT; no effect of error was observed. Triangulating the data with RTAs leads us to hypothesize that the higher cognitive load in UCPs is linked to increases in reader enjoyment and immersion. The effect of translation creativity on cognitive load in different translation modalities at word-level is novel and opens up new avenues for further research. All the code and data are available at https://github.com/INCREC/Pilot_to_MT_or_not_to_MT


asKAN: Active Subspace embedded Kolmogorov-Arnold Network

Zhou, Zhiteng, Xu, Zhaoyue, Liu, Yi, Wang, Shizhao

arXiv.org Artificial Intelligence

The Kolmogorov-Arnold Network (KAN) has emerged as a promising neural network architecture for small-scale AI+Science applications. However, it suffers from inflexibility in modeling ridge functions, which is widely used in representing the relationships in physical systems. This study investigates this inflexibility through the lens of the Kolmogorov-Arnold theorem, which starts the representation of multivariate functions from constructing the univariate components rather than combining the independent variables. Our analysis reveals that incorporating linear combinations of independent variables can substantially simplify the network architecture in representing the ridge functions. Inspired by this finding, we propose active subspace embedded KAN (asKAN), a hierarchical framework that synergizes KAN's function representation with active subspace methodology. The architecture strategically embeds active subspace detection between KANs, where the active subspace method is used to identify the primary ridge directions and the independent variables are adaptively projected onto these critical dimensions. The proposed asKAN is implemented in an iterative way without increasing the number of neurons in the original KAN. The proposed method is validated through function fitting, solving the Poisson equation, and reconstructing sound field. Compared with KAN, asKAN significantly reduces the error using the same network architecture. The results suggest that asKAN enhances the capability of KAN in fitting and solving equations in the form of ridge functions.


Adversarial Debiasing for Unbiased Parameter Recovery

Sanford, Luke C, Ayers, Megan, Gordon, Matthew, Stone, Eliana

arXiv.org Machine Learning

Advances in machine learning and the increasing availability of high-dimensional data have led to the proliferation of social science research that uses the predictions of machine learning models as proxies for measures of human activity or environmental outcomes. However, prediction errors from machine learning models can lead to bias in the estimates of regression coefficients. In this paper, we show how this bias can arise, propose a test for detecting bias, and demonstrate the use of an adversarial machine learning algorithm in order to de-bias predictions. These methods are applicable to any setting where machine-learned predictions are the dependent variable in a regression. We conduct simulations and empirical exercises using ground truth and satellite data on forest cover in Africa. Using the predictions from a naive machine learning model leads to biased parameter estimates, while the predictions from the adversarial model recover the true coefficients.