Enabling Pareto-Stationarity Exploration in Multi-Objective Reinforcement Learning: A Multi-Objective Weighted-Chebyshev Actor-Critic Approach

Hairi, Fnu, Yang, Jiao, Zhou, Tianchen, Yang, Haibo, Dong, Chaosheng, Yang, Fan, Momma, Michinari, Gao, Yan, Liu, Jia

arXiv.org Artificial Intelligence 

In many multi-objective reinforcement learning (MORL) applications, being able to systematically explore the Pareto-stationary solutions under multiple non-convex reward objectives with theoretical finite-time sample complexity guarantee is an important and yet under-explored problem. This motivates us to take the first step and fill the important gap in MORL. Specifically, in this paper, we propose a \uline{M}ulti-\uline{O}bjective weighted-\uline{CH}ebyshev \uline{A}ctor-critic (MOCHA) algorithm for MORL, which judiciously integrates the weighted-Chebychev (WC) and actor-critic framework to enable Pareto-stationarity exploration systematically with finite-time sample complexity guarantee. Sample complexity result of MOCHA algorithm reveals an interesting dependency on $p_{\min}$ in finding an $ε$-Pareto-stationary solution, where $p_{\min}$ denotes the minimum entry of a given weight vector $\mathbf{p}$ in WC-scarlarization. By carefully choosing learning rates, the sample complexity for each exploration can be $\tilde{\mathcal{O}}(ε^{-2})$. Furthermore, simulation studies on a large KuaiRand offline dataset, show that the performance of MOCHA algorithm significantly outperforms other baseline MORL approaches.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found