OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas
Wang, Xiaoyang, Zhang, Hongming, Ge, Tao, Yu, Wenhao, Yu, Dian, Yu, Dong
–arXiv.org Artificial Intelligence
Customizable role-playing in large language models (LLMs), also known as character generalization, is gaining increasing attention for its versatility and cost-efficiency in developing and deploying role-playing dialogue agents. This study explores a large-scale data synthesis approach to equip LLMs with character generalization capabilities. We begin by synthesizing large-scale character profiles using personas from Persona Hub and then explore two strategies: response rewriting and response generation, to create character-aligned instructional responses. To validate the effectiveness of our synthetic instruction tuning data for character generalization, we perform supervised fine-tuning (SFT) using the LLaMA-3 8B model. Our best-performing model strengthens the original LLaMA-3 8B Instruct model and achieves performance comparable to GPT-4o models on role-playing dialogue. We release our synthetic characters and instruction-tuning dialogues to support public research.
arXiv.org Artificial Intelligence
Feb-17-2025
- Country:
- North America > United States (0.67)
- Genre:
- Research Report (1.00)
- Industry:
- Leisure & Entertainment > Games > Computer Games (0.50)
- Technology: