Recurrent Natural Policy Gradient for POMDPs

May-28-2024–arXiv.org Machine Learning

In this paper, we study a natural policy gradient method based on recurrent neural networks (RNNs) for partially-observable Markov decision processes, whereby RNNs are used for policy parameterization and policy evaluation to address curse of dimensionality in non-Markovian reinforcement learning. We present finite-time and finite-width analyses for both the critic (recurrent temporal difference learning), and correspondingly-operated recurrent natural policy gradient method in the near-initialization regime. Our analysis demonstrates the efficiency of RNNs for problems with short-term memory with explicit bounds on the required network widths and sample complexity, and points out the challenges in the case of long-term dependencies.

neural network, pomdp, rec-td, (9 more...)

arXiv.org Machine Learning

May-28-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Illinois (0.04)
  - Ohio > Franklin County
    - Columbus (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Germany > North Rhine-Westphalia
    - Cologne Region > Aachen (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Reinforcement Learning (1.00)
  - Neural Networks (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found