Eliciting User Preferences for Personalized Multi-Objective Decision Making through Comparative Feedback

Shao, Han, Cohen, Lee, Blum, Avrim, Mansour, Yishay, Saha, Aadirupa, Walter, Matthew R.

Oct-31-2023–arXiv.org Artificial Intelligence

In classic reinforcement learning (RL) and decision making problems, policies are evaluated with respect to a scalar reward function, and all optimal policies are the same with regards to their expected return. However, many real-world problems involve balancing multiple, sometimes conflicting, objectives whose relative priority will vary according to the preferences of each user. Consequently, a policy that is optimal for one user might be sub-optimal for another. In this work, we propose a multi-objective decision making framework that accommodates different user preferences over objectives, where preferences are learned via policy comparisons. Our model consists of a Markov decision process with a vector-valued reward function, with each user having an unknown preference vector that expresses the relative importance of each objective. The goal is to efficiently compute a near-optimal policy for a given user. We consider two user feedback models. We first address the case where a user is provided with two policies and returns their preferred policy as feedback. We then move to a different user feedback model, where a user is instead provided with two small weighted sets of representative trajectories and selects the preferred one. In both cases, we suggest an algorithm that finds a nearly optimal policy for the user using a small number of comparison queries.

artificial intelligence, machine learning, personalized multi-objective decision, (2 more...)

arXiv.org Artificial Intelligence

Oct-31-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.53)
  - Representation & Reasoning > Personal Assistant Systems (0.60)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found