A Robust Policy Bootstrapping Algorithm for Multi-objective Reinforcement Learning in Non-stationary Environments

Abdelfattah, Sherif, Kasmarik, Kathryn, Hu, Jiankun

Aug-17-2023–arXiv.org Artificial Intelligence

Multi-objective Markov decision processes are a special kind of multi-objective optimization problem that involves sequential decision making while satisfying the Markov property of stochastic processes. Multi-objective reinforcement learning methods address this problem by fusing the reinforcement learning paradigm with multi-objective optimization techniques. One major drawback of these methods is the lack of adaptability to non-stationary dynamics in the environment. This is because they adopt optimization procedures that assume stationarity to evolve a coverage set of policies that can solve the problem. This paper introduces a developmental optimization approach that can evolve the policy coverage set while exploring the preference space over the defined objectives in an online manner. We propose a novel multi-objective reinforcement learning algorithm that can robustly evolve a convex coverage set of policies in an online manner in non-stationary environments. We compare the proposed algorithm with two state-of-the-art multi-objective reinforcement learning algorithms in stationary and non-stationary environments. Results showed that the proposed algorithm significantly outperforms the existing algorithms in non-stationary environments while achieving comparable results in stationary environments.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

Aug-17-2023

arXiv.org PDF

Add feedback

Country:
- Oceania > Australia
  - New South Wales (0.04)
  - Australian Capital Territory > Canberra (0.04)
- North America > United States
  - Massachusetts > Suffolk County > Boston (0.04)
- Europe > Netherlands
  - North Holland > Amsterdam (0.04)
- Asia > Taiwan
  - Taiwan Province > Taipei (0.04)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Performance Analysis > Accuracy (0.41)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found