Direct Preference-Based Evolutionary Multi-Objective Optimization with Dueling Bandits

May-27-2025, 19:18:08 GMT–Neural Information Processing Systems

The ultimate goal of multi-objective optimization (MO) is to assist human decision-makers (DMs) in identifying solutions of interest (SOI) that optimally reconcile multiple objectives according to their preferences. Yet, current PBEMO approaches are prone to be inefficient and misaligned with the DM's true aspirations, especially when inadvertently exploiting mis-calibrated reward models. This is further exacerbated when considering the stochastic nature of human feedback. This paper proposes a novel framework that navigates MO to SOI by directly leveraging human feedback without being restricted by a predefined reward model nor cumbersome model selection. Specifically, we developed a clustering-based stochastic dueling bandits algorithm that strategically scales well to high-dimensional dueling bandits, and achieves a regret of \mathcal{O}(K 2\log T), where K is the number of clusters and T is the number of rounds.

direct preference-based evolutionary multi-objective optimization, dueling bandit, human feedback, (3 more...)

Neural Information Processing Systems

May-27-2025, 19:18:08 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (0.64)
  - Machine Learning
    - Evolutionary Systems (0.85)
    - Statistical Learning (0.62)