On Polynomial Time PAC Reinforcement Learning with Rich Observations

Dann, Christoph, Jiang, Nan, Krishnamurthy, Akshay, Agarwal, Alekh, Langford, John, Schapire, Robert E.

arXiv.org Machine Learning 

We study episodic reinforcement learning (RL) when the observations may be realistically rich, such as images or text. We aim for methods that use function approximation in a provably effective manner to find the best possible policy through systematic exploration. While such problems are central to empirical RL research [22], most theoretical results on systematic exploration have focused on tabular MDPs with small state spaces [e.g., 19]. Until recently, little was known about how to engage in sophisticated exploration in the general function approximation setting to achieve global optimality in a statistically efficient manner. Indeed, as pointed out by Krishnamurthy et al. [20], no algorithm achieving polynomial sample complexity is possible without further assumptions. Nevertheless, when the underlying problem exhibits additional structure, it was recently shown that learning becomes statistically feasible. In particular, Krishnamurthy et al. [20] showed that reactive POMDPs with rich observations and deterministic dynamics over M hidden states can be learned with polynomial sample complexity that depends on M. Later, Jiang et al. [16] provided a new algorithm called O LIVE that In this paper, we directly address this difficult computational challenge. We adopt a reduction approach, meaning that we aim to design algorithms whose computation can be reduced to common optimization oracles over function spaces, such as linear optimization and cost-sensitive classification, while retaining the statistical properties of prior works.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found