PersoBench: Benchmarking Personalized Response Generation in Large Language Models
Afzoon, Saleh, Naseem, Usman, Beheshti, Amin, Jamali, Zahra
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) have revolutionized NLP, excelling in human-like text generation across domains and becoming central to dialogue systems. However, evaluating their ability to generate personalized responses that enhance user engagement is crucial, especially in applications like customer service, where tailored interactions boost satisfaction [1]. While recent benchmarks such as RPBench-Auto [2], TIMECHARA [3] and RoleLLM [4] have been introduced in the role-playing domain to assess LLMs' adherence to predefined characters or roles in character-based, scene-based, and temporal setups, there is still no dedicated benchmark for automatic personalized response generation of LLMs in the literature. Further, existing benchmarks also suffer from biases in their evaluations due to the use of large LLMs as judges, and limited experimental sizes constrain them. To fill this gap, we introduce PersoBench, a benchmark for response personalization, to assess the strengths and limitations of current LLMs in generating personalized responses. To the best of our knowledge, no prior work has introduced a comprehensive benchmark specifically focused on evaluating response personalization in LLMs. Using comprehensive datasets and a diverse set of established metrics, including fluency, diversity, and coherence, we ensure a robust evaluation of various aspects of response generation, drawing on insights from a recent survey in the field [1]. More specifically, in line with this objective of the mentioned context, we aim to answer the following research questions: 1. Can LLMs generate fluent responses?
arXiv.org Artificial Intelligence
Oct-4-2024
- Country:
- Asia > Middle East
- Iran > Fars Province > Shiraz (0.04)
- Europe
- North America
- Canada > Ontario
- Toronto (0.04)
- United States
- California
- San Diego County > San Diego (0.04)
- Santa Clara County > Palo Alto (0.04)
- Michigan > Washtenaw County
- Ann Arbor (0.04)
- California
- Canada > Ontario
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Asia > Middle East
- Genre:
- Research Report > New Finding (0.66)
- Technology: