Accommodating Picky Customers: Regret Bound and Exploration Complexity for Multi-Objective Reinforcement Learning

Wu, Jingfeng, Braverman, Vladimir, Yang, Lin F.

Nov-25-2020–arXiv.org Machine Learning

In single-objective reinforcement learning (RL), a scalar reward is pre-specified and an agent learns a policy to maximize the long-term cumulative reward [Azar et al., 2017, Jin et al., 2018]. However, in many real-world applications, we need to optimize multiple objectives for the same (unknown) environment, even when these objectives are possibly contradicting [Roijers et al., 2013]. For example, in an autonomous driving application, each passenger may have a different preference of driving styles: some of the passengers prefer a very steady riding experience while other passengers enjoy the fast acceleration of the car. Therefore, traditional single-objective RL approach may fail to be applied in such scenarios. One way to tackle this issue is the multi-objective reinforcement learning (MORL) [Roijers et al., 2013, Yang et al., 2019, Natarajan and Tadepalli, 2005, Abels et al., 2018] method, which models the multiple objectives by a vectorized reward, and an additional preference vector to specify the relative importance of each objective. The agent of MORL needs to find policies to optimize the cumulative preference-weighted rewards under all possible preferences.

algorithm, probability, transition probability, (14 more...)

arXiv.org Machine Learning

Nov-25-2020

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Jordan (0.04)
- North America > United States
  - California > Los Angeles County > Los Angeles (0.14)

Genre:
- Research Report (0.64)

Industry:
- Transportation
  - Ground > Road (0.54)
  - Passenger (0.65)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found