Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment
–Neural Information Processing Systems
Personalized alignment is essential for enabling large language models (LLMs) to engage effectively in user-centric dialogue. While recent prompt-based and offline optimization methods offer preliminary solutions, they fall short in coldstart scenarios and long-term personalization due to their inherently static and shallow designs. In this work, we introduce the Reinforcement Learning for Personalized Alignment (RLPA) framework, in which an LLM interacts with a simulated user model to iteratively infer and refine user profiles through dialogue. The training process is guided by a dual-level reward structure: the Profile Reward encourages accurate construction of user representations, while the Response Reward incentivizes generation of responses consistent with the inferred profile.
Neural Information Processing Systems
Jun-16-2026, 01:23:06 GMT
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.92)
- Research Report
- Industry:
- Information Technology > Security & Privacy (0.93)
- Leisure & Entertainment (0.67)
- Education > Curriculum
- Subject-Specific Education (0.40)
- Technology: