Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning

Abdulhai, Marwa, Cheng, Ryan, Clay, Donovan, Althoff, Tim, Levine, Sergey, Jaques, Natasha

Nov-4-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly used to simulate human users in interactive settings such as therapy, education, and social role-play. While these simulations enable scalable training and evaluation of AI agents, off-the-shelf LLMs often drift from their assigned personas, contradict earlier statements, or abandon role-appropriate behavior. We introduce a unified framework for evaluating and improving persona consistency in LLM-generated dialogue. We define three automatic metrics: prompt-to-line consistency, line-to-line consistency, and Q&A consistency, that capture different types of persona drift and validate each against human annotations. Using these metrics as reward signals, we apply multi-turn reinforcement learning to fine-tune LLMs for three user roles: a patient, a student, and a social chat partner. Our method reduces inconsistency by over 55%, resulting in more coherent and faithful simulated users.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Nov-4-2025

arXiv.org PDF

Add feedback

Country:
- Asia (1.00)
- North America > United States
  - Wisconsin (0.28)

Genre:
- Questionnaire & Opinion Survey (0.93)
- Personal > Interview (0.92)
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Government (1.00)
- Health & Medicine
  - Therapeutic Area > Psychiatry/Psychology (1.00)
  - Consumer Health (0.92)
- Education > Educational Setting
  - K-12 Education (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found