A Variant of the Wang-Foster-Kakade Lower Bound for the Discounted Setting

Amortila, Philip, Jiang, Nan, Xie, Tengyang

Nov-3-2020–arXiv.org Artificial Intelligence

Recently, Wang et al. (2020) showed a highly intriguing hardness result for batch reinforcement learning (RL) with linearly realizable value function and good feature coverage in the finite-horizon case. In this note we show that once adapted to the discounted setting, the construction can be simplified to a 2-state MDP with 1-dimensional features, such that learning is impossible even with an infinite amount of data. Wang et al. (2020) recently showed that in finite-horizon batch RL, the sample complexity of evaluating a given policy π has an information-theoretic lower bound that is exponential in the horizon, even if realizable linear features are given (i.e., ϕ: S A R

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Nov-3-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States > Illinois (0.08)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found