A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations

Li, Li, Cai, Peilin, Rossi, Ryan A., Dernoncourt, Franck, Kveton, Branislav, Wu, Junda, Yu, Tong, Song, Linxin, Yang, Tiankai, Qin, Yuehan, Ahmed, Nesreen K., Basu, Samyadeep, Mukherjee, Subhojyoti, Zhang, Ruiyi, Hu, Zhengmian, Ni, Bo, Zhou, Yuxiao, Wang, Zichao, Huang, Yue, Wang, Yu, Zhang, Xiangliang, Yu, Philip S., Hu, Xiyang, Zhao, Yue

May-27-2025–arXiv.org Artificial Intelligence

We present PersonaConvBench, a large-scale benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs). Unlike existing work that focuses on either personalization or conversational structure in isolation, PersonaConvBench integrates both, offering three core tasks: sentence classification, impact regression, and user-centric text generation across ten diverse Reddit-based domains. This design enables systematic analysis of how personalized conversational context shapes LLM outputs in realistic multi-user scenarios. We benchmark several commercial and open-source LLMs under a unified prompting setup and observe that incorporating personalized history yields substantial performance improvements, including a 198 percent relative gain over the best non-conversational baseline in sentiment classification. By releasing PersonaConvBench with evaluations and code, we aim to support research on LLMs that adapt to individual styles, track long-term context, and produce contextually rich, engaging responses.

classification, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

May-27-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.28)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine (0.46)
- Education (0.46)
- Media > News (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.72)
    - Performance Analysis > Accuracy (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found