Goto

Collaborating Authors

 humanlikeness


HLB: Benchmarking LLMs' Humanlikeness in Language Use

Duan, Xufeng, Xiao, Bei, Tang, Xuemei, Cai, Zhenguang G.

arXiv.org Artificial Intelligence

As synthetic data becomes increasingly prevalent in training language models, particularly through generated dialogue, concerns have emerged that these models may deviate from authentic human language patterns, potentially losing the richness and creativity inherent in human communication. This highlights the critical need to assess the humanlikeness of language models in real-world language use. In this paper, we present a comprehensive humanlikeness benchmark (HLB) evaluating 20 large language models (LLMs) using 10 psycholinguistic experiments designed to probe core linguistic aspects, including sound, word, syntax, semantics, and discourse (see https://huggingface.co/spaces/XufengDuan/HumanLikeness). To anchor these comparisons, we collected responses from over 2,000 human participants and compared them to outputs from the LLMs in these experiments. For rigorous evaluation, we developed a coding algorithm that accurately identified language use patterns, enabling the extraction of response distributions for each task. By comparing the response distributions between human participants and LLMs, we quantified humanlikeness through distributional similarity. Our results reveal fine-grained differences in how well LLMs replicate human responses across various linguistic levels. Importantly, we found that improvements in other performance metrics did not necessarily lead to greater humanlikeness, and in some cases, even resulted in a decline. By introducing psycholinguistic methods to model evaluation, this benchmark offers the first framework for systematically assessing the humanlikeness of LLMs in language use.


How Humanlike Should a Social Robot Be: A User-Centered Exploration

Lee, Hee Rin (Indiana University) | Šabanović, Selma (Indiana University) | Stolterman, Erik (Indiana University)

AAAI Conferences

Robot designers commonly emphasize humanlikeness as an important design feature to make robots social or user-friendly. To understand how users make sense of the design characteristics of robots, we asked 6 participants to classify and interpret the appearance of existing robots in relation to their function and potential usefulness. All the robots had humanlike aspects in their design, and participants most commonly remarked on these humanlike features of the robots. However, the commonsense logic of the “Uncanny Valley” (UV) in HRI design, which suggests that robots should be similar to humans to some degree without being too humanlike, was not supported by participant comments, which did not correlate humanlikeness to user-friendliness in line with the UV hypothesis. Rather, participants related the design features of robots to their everyday contexts, and focused their commentary on context-dependent design implications. As a result, we suggest our understanding of the design characteristics of robots should include the perspectives of users from the earliest stages of design so we can understand their contextual interpretations of different design characteristics. Open and modularized technical platforms could support the inclusion of users in the creation of future social robots.