mo-mab
Pareto Regret Analyses in Multi-objective Multi-armed Bandit
We study Pareto optimality in multi-objective by minimizing some regret metric measuring how far the multi-armed bandit by providing a formulation player is away from optimality. of adversarial multi-objective multi-armed bandit and defining its Pareto regrets that can be applied There are two ways to define optimality: Pareto optimality to both stochastic and adversarial settings. The in the reward vector space and Scalarized optimality by regrets do not rely on any scalarization functions scalarizing reward vectors. Pareto optimality admits a Pareto and reflect Pareto optimality compared to scalarized optimal front defined as the set of rewards of optimal arms regrets. We also present new algorithms assuming determined by the Pareto order relationship. With limited both with and without prior information information based on the definition of MO-MAB, it is a great of the multi-objective multi-armed bandit setting.
- North America > United States > Illinois > Cook County > Evanston (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Data Science > Data Mining > Big Data (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)