Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
Abdulhai, Marwa, Cheng, Ryan, Clay, Donovan, Althoff, Tim, Levine, Sergey, Jaques, Natasha
–arXiv.org Artificial Intelligence
Large Language Models (LLMs) are increasingly used to simulate human users in interactive settings such as therapy, education, and social role-play. While these simulations enable scalable training and evaluation of AI agents, off-the-shelf LLMs often drift from their assigned personas, contradict earlier statements, or abandon role-appropriate behavior. We introduce a unified framework for evaluating and improving persona consistency in LLM-generated dialogue. We define three automatic metrics: prompt-to-line consistency, line-to-line consistency, and Q&A consistency, that capture different types of persona drift and validate each against human annotations. Using these metrics as reward signals, we apply multi-turn reinforcement learning to fine-tune LLMs for three user roles: a patient, a student, and a social chat partner. Our method reduces inconsistency by over 55%, resulting in more coherent and faithful simulated users.
arXiv.org Artificial Intelligence
Nov-4-2025
- Country:
- Africa > Mali (0.04)
- Asia
- Europe
- France (0.04)
- Greece (0.04)
- Middle East > Malta (0.04)
- North America > United States
- District of Columbia > Washington (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Rhode Island (0.04)
- Virginia (0.04)
- Wisconsin > Eau Claire County
- Eau Claire (0.14)
- Oceania > Australia
- Genre:
- Personal > Interview (0.92)
- Questionnaire & Opinion Survey (0.93)
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Industry:
- Education > Educational Setting
- K-12 Education (1.00)
- Government (1.00)
- Health & Medicine
- Consumer Health (1.00)
- Therapeutic Area > Psychiatry/Psychology (1.00)
- Education > Educational Setting
- Technology: