Projected Natural Actor-Critic
–Neural Information Processing Systems
In this paper we address a drawback of natural actor-critics that limits their real-world applicability--their lack of safety guarantees. We present a principled algorithm for performing natural gradient descent over a constrained domain. In the context of reinforcement learning, this allows for natural actor-critic algorithms that are guaranteed to remain within a known safe region of policy space. While deriving our class of constrained natural actor-critic algorithms, which we call Projected Natural Actor-Critics (PNACs), we also elucidate the relationship between natural gradient descent and mirror descent.
Neural Information Processing Systems
Mar-13-2024, 21:24:22 GMT
- Country:
- North America > United States
- California (0.04)
- Massachusetts > Hampshire County
- Amherst (0.14)
- New Jersey > Mercer County
- Princeton (0.04)
- New York (0.04)
- Pennsylvania > Philadelphia County
- Philadelphia (0.04)
- North America > United States
- Industry:
- Health & Medicine > Therapeutic Area (0.46)
- Technology: