Reviews: Data-Efficient Reinforcement Learning in Continuous State-Action Gaussian-POMDPs

Neural Information Processing Systems 

This paper describes an extension to the PILCO algorithm (Probabilistic Inference and Learning for COntrol, a data-efficient reinforcement algorithm). The proposed algorithm applies a measurement filtering algorithm during the actual experiment and explicitly takes this measurement filtering algorithm into account during the policy learning step, which uses data from the experiment. This is an important practical extension addressing the fact that measurements are often very noisy. My intuitive explanation for this approach is that the proposed approach makes the overall feedback system more "repeatable" (noise is mostly filtered out) and therefore learning is faster (given that the filtering is effective, see last sentence of the conclusion). The paper presents detailed mathematical derivations and strong simulation results that highlight the properties of the proposed algorithm.