Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment

Zhao, Weixiang, Sui, Xingyu, Hu, Yulin, Guo, Jiahe, Liu, Haixiao, Li, Biye, Zhao, Yanyan, Qin, Bing, Liu, Ting

Dec-12-2025–arXiv.org Artificial Intelligence

Personalized alignment is essential for enabling large language models (LLMs) to engage effectively in user-centric dialogue. While recent prompt-based and offline optimization methods offer preliminary solutions, they fall short in cold-start scenarios and long-term personalization due to their inherently static and shallow designs. In this work, we introduce the Reinforcement Learning for Personalized Alignment (RLPA) framework, in which an LLM interacts with a simulated user model to iteratively infer and refine user profiles through dialogue. The training process is guided by a dual-level reward structure: the Profile Reward encourages accurate construction of user representations, while the Response Reward incentivizes generation of responses consistent with the inferred profile. We instantiate RLPA by fine-tuning Qwen-2.5-3B-Instruct, resulting in Qwen-RLPA, which achieves state-of-the-art performance in personalized dialogue. Empirical evaluations demonstrate that Qwen-RLPA consistently outperforms prompting and offline fine-tuning baselines, and even surpasses advanced commercial models such as Claude-3.5 and GPT-4o. Further analysis highlights Qwen-RLPA's robustness in reconciling conflicting user preferences, sustaining long-term personalization and delivering more efficient inference compared to recent reasoning-focused LLMs. These results emphasize the potential of dynamic profile inference as a more effective paradigm for building personalized dialogue systems.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Dec-12-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.92)

Industry:
- Information Technology > Security & Privacy (0.93)
- Leisure & Entertainment (0.67)
- Education > Curriculum
  - Subject-Specific Education (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found