PersoBench: Benchmarking Personalized Response Generation in Large Language Models

Afzoon, Saleh, Naseem, Usman, Beheshti, Amin, Jamali, Zahra

Oct-4-2024–arXiv.org Artificial Intelligence

Large Language Models (LLMs) have revolutionized NLP, excelling in human-like text generation across domains and becoming central to dialogue systems. However, evaluating their ability to generate personalized responses that enhance user engagement is crucial, especially in applications like customer service, where tailored interactions boost satisfaction [1]. While recent benchmarks such as RPBench-Auto [2], TIMECHARA [3] and RoleLLM [4] have been introduced in the role-playing domain to assess LLMs' adherence to predefined characters or roles in character-based, scene-based, and temporal setups, there is still no dedicated benchmark for automatic personalized response generation of LLMs in the literature. Further, existing benchmarks also suffer from biases in their evaluations due to the use of large LLMs as judges, and limited experimental sizes constrain them. To fill this gap, we introduce PersoBench, a benchmark for response personalization, to assess the strengths and limitations of current LLMs in generating personalized responses. To the best of our knowledge, no prior work has introduced a comprehensive benchmark specifically focused on evaluating response personalization in LLMs. Using comprehensive datasets and a diverse set of established metrics, including fluency, diversity, and coherence, we ensure a robust evaluation of various aspects of response generation, drawing on insights from a recent survey in the field [1]. More specifically, in line with this objective of the mentioned context, we aim to answer the following research questions: 1. Can LLMs generate fluent responses?

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-4-2024

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales > Sydney (0.04)
- North America
  - United States
    - Michigan > Washtenaw County
      - Ann Arbor (0.04)
    - California
      - Santa Clara County > Palo Alto (0.04)
      - San Diego County > San Diego (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Italy > Tuscany
    - Florence (0.04)
- Asia > Middle East
  - Iran > Fars Province > Shiraz (0.04)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.76)