Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandits
–Neural Information Processing Systems
The ultimate goal of multi-objective optimization (MO) is to assist human decision-makers (DMs) in identifying solutions of interest (SOI) that optimally reconcile multiple objectives according to their preferences. Yet, current PBEMO approaches are prone to be inefficient and misaligned with the DM's true aspirations, especially when inadvertently exploiting mis-calibrated reward models. This is further exacerbated when considering the stochastic nature of human feedback. This paper proposes a novel framework that navigates MO to SOI by directly leveraging human feedback without being restricted by a predefined reward model nor cumbersome model selection. Specifically, we developed a clustering-based stochastic dueling bandits algorithm that strategically scales well to high-dimensional dueling bandits, and achieves a regret of \mathcal{O}(K 2\log T), where K is the number of clusters and T is the number of rounds.
Neural Information Processing Systems
May-27-2025, 19:18:08 GMT
- Technology: