Pareto Regret Analyses in Multi-objective Multi-armed Bandit

Xu, Mengfan, Klabjan, Diego

arXiv.org Artificial Intelligence 

We study Pareto optimality in multi-objective by minimizing some regret metric measuring how far the multi-armed bandit by providing a formulation player is away from optimality. of adversarial multi-objective multi-armed bandit and defining its Pareto regrets that can be applied There are two ways to define optimality: Pareto optimality to both stochastic and adversarial settings. The in the reward vector space and Scalarized optimality by regrets do not rely on any scalarization functions scalarizing reward vectors. Pareto optimality admits a Pareto and reflect Pareto optimality compared to scalarized optimal front defined as the set of rewards of optimal arms regrets. We also present new algorithms assuming determined by the Pareto order relationship. With limited both with and without prior information information based on the definition of MO-MAB, it is a great of the multi-objective multi-armed bandit setting.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found