FlowPG: Action-constrained Policy Gradient with Normalizing Flows

Oct-11-2024, 14:02:22 GMT–Neural Information Processing Systems

Action-constrained reinforcement learning (ACRL) is a popular approach for solving safety-critical and resource-allocation related decision making problems. A major challenge in ACRL is to ensure agent taking a valid action satisfying constraints in each RL step. Commonly used approach of using a projection layer on top of the policy network requires solving an optimization program which can result in longer training time, slow convergence, and zero gradient problem. To address this, first we use a normalizing flow model to learn an invertible, differentiable mapping between the feasible action space and the support of a simple distribution on a latent variable, such as Gaussian. Second, learning the flow model requires sampling from the feasible action space, which is also challenging.

action-constrained policy gradient, artificial intelligence, machine learning, (6 more...)

Neural Information Processing Systems

Oct-11-2024, 14:02:22 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.81)