Goto

Collaborating Authors

 Reinforcement Learning








60cb558c40e4f18479664069d9642d5a-Paper.pdf

Neural Information Processing Systems

In real-world decision-making tasks, learning an optimal policy without a trialand-error process is an appealing challenge. When expert demonstrations are available, imitation learning that mimics expert actions can learn a good policy efficiently.


AnExponentialLowerBoundforLinearly-Realizable MDPswithConstantSuboptimalityGap

Neural Information Processing Systems

A fundamental question in the theory of reinforcement learning is: suppose the optimalQ-function lies inthe linear span ofagivenddimensional feature mapping, is sample-efficient reinforcement learning (RL) possible? The recent and remarkable result of Weisz et al. (2020) resolves this question in the negative, providinganexponential(ind)samplesizelowerbound,whichholdsevenifthe agent has access to a generative model of the environment. One may hope that such a lower can be circumvented with an even stronger assumption that there isaconstant gapbetween the optimalQ-value ofthe best action and that ofthe second-best action (for allstates); indeed, the construction inWeisz etal.