DP-Dueling: Learning from Preference Feedback without Compromising User Privacy

Saha, Aadirupa, Asi, Hilal

arXiv.org Artificial Intelligence 

Research has indicated that it is often more convenient, faster, and cost-effective to gather feedback in a relative manner rather than using absolute ratings [31, 40]. To illustrate, when assessing an individual's preference between two items, such as A and B, it is often easier for respondents to answer preference-oriented queries like "Which item do you prefer, A or B?" instead of requesting to rate items A and B on a scale ranging from 0 to 10. From the perspective of a system designer, leveraging this user preference data can significantly enhance system performance, especially when this data can be collected in a relative and online fashion. This applies to various real-world scenarios, including recommendation systems, crowd-sourcing platforms, training bots, multiplayer games, search engine optimization, online retail, and more. In many practical situations, particularly when human preferences are gathered online, such as designing surveys, expert reviews, product selection, search engine optimization, recommender systems, multiplayer game rankings, and even broader reinforcement learning problems with complex reward structures, it's often easier to elicit preference feedback instead of relying on absolute ratings or rewards. Because of its broad utility and the simplicity of gathering data using relative feedback, learning from preferences has become highly popular in the machine learning community. It has been extensively studied over the past decade under the name "Dueling-Bandits" (DB) in the literature. This framework is an extension of the traditional multi-armed bandit (MAB) setting, as described in [4]. In the DB framework, the goal is to identify a set of'good' options from a fixed decision

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found