Assessing the Human-Likeness of LLM-Driven Digital Twins in Simulating Health Care System Trust

Wu, Yuzhou, Wu, Mingyang, Liu, Di, Yin, Rong, Li, Kang

arXiv.org Artificial Intelligence 

Serving as an emerging and powerful tool, Large Language Model (LLM) - driven Human Digital Twins are showing great potential in healthcare system research. However, its actual simulation ability for complex human psychological traits, such as distrust in the healthcare system, remains unclear. This research gap particularly impacts health professionals' trust and usage of LLM - based Artificial Intelligence (AI) systems in assisting their routine work. In this study, based on the Twin-2K-500 dataset, we systematically evaluated the simulation results of the LLM-driven human digital twin using the Health Care System Distrust Scale (HCSDS) with an established human-subject sample, analyzing item-level distributions, summary statistics, and demographic subgroup patterns. Results show ed that the simulated responses by the digital twin were significantly more centralized with lower variance and had fewer selections of extreme options (all p<0.001) . While the digital twin broa dly reproduces human results in major demographic patterns, such as age and gender, it exhibits relatively low sensitivity in capturing minor differences in education levels. The LLMbased digital twin simulation has the potential to simulate population trends, but it also presents challenges in making detailed, specific distinction s in subgroups of human beings. This study suggests that the current LLM - driven Digital Twins have limitations in modeling complex human attitudes, which require careful calibration and validation before applying them in inferential analyses or policy simulations in health systems engineering. Future studies are necessary to examine the emotional reasonin g mechanism of LLMs before their use, particularly for studies that involve simulations sensitive to social topics, such as human-automation trust.