Pareto Regret Analyses in Multi-objective Multi-armed Bandit

May-30-2023–arXiv.org Artificial Intelligence

We study Pareto optimality in multi-objective by minimizing some regret metric measuring how far the multi-armed bandit by providing a formulation player is away from optimality. of adversarial multi-objective multi-armed bandit and defining its Pareto regrets that can be applied There are two ways to define optimality: Pareto optimality to both stochastic and adversarial settings. The in the reward vector space and Scalarized optimality by regrets do not rely on any scalarization functions scalarizing reward vectors. Pareto optimality admits a Pareto and reflect Pareto optimality compared to scalarized optimal front defined as the set of rewards of optimal arms regrets. We also present new algorithms assuming determined by the Pareto order relationship. With limited both with and without prior information information based on the definition of MO-MAB, it is a great of the multi-objective multi-armed bandit setting.

data mining, machine learning, mo-mab, (17 more...)

arXiv.org Artificial Intelligence

May-30-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > Hawaii (0.14)

Genre:
- Research Report (0.50)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found