PD-MORL: Preference-Driven Multi-Objective Reinforcement Learning Algorithm

Basaklar, Toygun, Gumussoy, Suat, Ogras, Umit Y.

May-29-2023–arXiv.org Artificial Intelligence

Multi-objective reinforcement learning (MORL) approaches have emerged to tackle many real-world problems with multiple conflicting objectives by maximizing a joint objective function weighted by a preference vector. These approaches find fixed customized policies corresponding to preference vectors specified during training. However, the design constraints and objectives typically change dynamically in real-life scenarios. Furthermore, storing a policy for each potential preference is not scalable. Hence, obtaining a set of Pareto front solutions for the entire preference space in a given domain with a single training is critical. To this end, we propose a novel MORL algorithm that trains a single universal network to cover the entire preference space scalable to continuous robotic tasks. The proposed approach, Preference-Driven MORL (PD-MORL), utilizes the preferences as guidance to update the network parameters. It also employs a novel parallelization approach to increase sample efficiency. We show that PD-MORL achieves up to 25% larger hypervolume for challenging continuous control tasks and uses an order of magnitude fewer trainable parameters compared to prior approaches. The main objective in a standard RL setting is to obtain a policy that maximizes a single cumulative reward by interacting with the environment. However, many real-world problems involve multiple, possibly conflicting, objectives. For example, robotics tasks should maximize speed while minimizing energy consumption. In contrast to single-objective environments, performance is measured using multiple objectives. Consequently, there are multiple Pareto-optimal solutions as a function of the preference between objectives (Navon et al., 2020). Multi-objective reinforcement learning (MORL) approaches (Hayes et al., 2022) have emerged to tackle these problems by maximizing a vector of rewards depending on the preferences. Existing approaches for multi-objective optimization generally transform the multidimensional objective space into a single dimension by statically assigning weights (preferences) to each objective (Liu et al., 2014).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

May-29-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Wisconsin > Dane County
    - Madison (0.04)
  - New Jersey > Mercer County
    - Princeton (0.04)

Genre:
- Research Report > Promising Solution (0.46)

Industry:
- Information Technology (0.45)
- Energy (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found