Online Expectation Maximization for Reinforcement Learning in POMDPs

Liu, Miao (Duke University) | Liao, Xuejun (Duke University) | Carin, Lawrence (Duke University)

AAAI Conferences 

We present online nested expectation maximization for model-free reinforcement learning in a POMDP. The algorithm evaluates the policy only in the current learning episode, discarding the episode after the evaluation and memorizing the sufficient statistic, from which the policy is computed in closed-form. As a result, the online algorithm has a time complexity O ( n ) and a memory complexity O (1), compared to O ( n 2 ) and O ( n ) for the corresponding batch-mode algorithm, where $n$ is the number of learning episodes. The online algorithm, which has a provable convergence, is demonstrated on five benchmark POMDP problems.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found