Tackling Biased Evaluators in Dueling Bandits
–Neural Information Processing Systems
In dueling bandits, an agent explores and exploits choices (i.e., arms) by learning from their stochastic feedback in the form of relative preferences. Prior related studies focused on unbiased feedback. In practice, however, the feedback provided by evaluators can be biased. For example, human users are likely to provide biased evaluation towards large language models due to their heterogeneous background. In this work, we aim to minimize the regret in dueling bandits considering evaluators' biased feedback.
Neural Information Processing Systems
Jun-18-2026, 05:17:41 GMT
- Country:
- North America > United States (0.93)
- Europe (0.67)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Research Report