User Preference Modeling for Conversational LLM Agents: Weak Rewards from Retrieval-Augmented Interaction