Trust Region Policy Optimization with Optimal Transport Discrepancies: Duality and Algorithm for Continuous Actions

Jan-15-2025, 05:28:29 GMT–Neural Information Processing Systems

Policy Optimization (PO) algorithms have been proven particularly suited to handle the high-dimensionality of real-world continuous control tasks. In this context, Trust Region Policy Optimization methods represent a popular approach to stabilize the policy updates. These usually rely on the Kullback-Leibler (KL) divergence to limit the change in the policy. The Wasserstein distance represents a natural alternative, in place of the KL divergence, to define trust regions or to regularize the objective function. However, state-of-the-art works either resort to its approximations or do not provide an algorithm for continuous state-action spaces, reducing the applicability of the method.In this paper, we explore optimal transport discrepancies (which include the Wasserstein distance) to define trust regions, and we propose a novel algorithm - Optimal Transport Trust Region Policy Optimization (OT-TRPO) - for continuous state-action spaces.

duality and algorithm, optimal transport discrepancy, trust region policy optimization, (5 more...)

Neural Information Processing Systems

Jan-15-2025, 05:28:29 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.39)

Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.41)