Real or Robotic? Assessing Whether LLMs Accurately Simulate Qualities of Human Responses in Dialogue
Ivey, Jonathan, Kumar, Shivani, Liu, Jiayu, Shen, Hua, Rakshit, Sushrita, Raju, Rohan, Zhang, Haotian, Ananthasubramaniam, Aparna, Kim, Junghwan, Yi, Bowen, Wright, Dustin, Israeli, Abraham, Møller, Anders Giovanni, Zhang, Lechen, Jurgens, David
–arXiv.org Artificial Intelligence
Studying and building datasets for dialogue tasks is both expensive and time-consuming due to the need to recruit, train, and collect data from study participants. In response, much recent work has sought to use large language models (LLMs) to simulate both human-human and human-LLM interactions, as they have been shown to generate convincingly human-like text in many settings. However, to what extent do LLM-based simulations \textit{actually} reflect human dialogues? In this work, we answer this question by generating a large-scale dataset of 100,000 paired LLM-LLM and human-LLM dialogues from the WildChat dataset and quantifying how well the LLM simulations align with their human counterparts. Overall, we find relatively low alignment between simulations and human interactions, demonstrating a systematic divergence along the multiple textual properties, including style and content. Further, in comparisons of English, Chinese, and Russian dialogues, we find that models perform similarly. Our results suggest that LLMs generally perform better when the human themself writes in a way that is more similar to the LLM's own style.
arXiv.org Artificial Intelligence
Sep-16-2024
- Country:
- Africa > Sub-Saharan Africa (0.04)
- Asia
- Central Asia (0.04)
- Indonesia > Bali (0.04)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.14)
- Thailand > Bangkok
- Bangkok (0.04)
- Europe
- Bulgaria > Varna Province
- Varna (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Bulgaria > Varna Province
- North America
- Central America (0.04)
- United States
- Arkansas (0.04)
- Illinois > Champaign County
- Urbana (0.04)
- Michigan (0.04)
- Texas > Travis County
- Austin (0.04)
- Virginia (0.04)
- Oceania (0.04)
- South America (0.04)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Technology: