Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration

Jan-19-2025, 08:09:11 GMT–Neural Information Processing Systems

In constrained multi-objective RL, the goal is to learn a policy that achieves the best performance specified by a multi-objective preference function under a constraint. We focus on the offline setting where the RL agent aims to learn the optimal policy from a given dataset. This scenario is common in real-world applications where interactions with the environment are expensive and the constraint violation is dangerous. For such a setting, we transform the original constrained problem into a primal-dual formulation, which is solved via dual gradient ascent. Moreover, we propose to combine such an approach with pessimism to overcome the uncertainty in offline data, which leads to our Pessimistic Dual Iteration (PEDI).

constraint violation, offline constrained multi-objective reinforcement learning, pessimistic dual value iteration, (3 more...)

Neural Information Processing Systems

Jan-19-2025, 08:09:11 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)