realizability
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)
AnExponentialLowerBoundforLinearly-Realizable MDPswithConstantSuboptimalityGap
A fundamental question in the theory of reinforcement learning is: suppose the optimalQ-function lies inthe linear span ofagivenddimensional feature mapping, is sample-efficient reinforcement learning (RL) possible? The recent and remarkable result of Weisz et al. (2020) resolves this question in the negative, providinganexponential(ind)samplesizelowerbound,whichholdsevenifthe agent has access to a generative model of the environment. One may hope that such a lower can be circumvented with an even stronger assumption that there isaconstant gapbetween the optimalQ-value ofthe best action and that ofthe second-best action (for allstates); indeed, the construction inWeisz etal.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Computational Hardness of Reinforcement Learning with Partial $q^π$-Realizability
This paper investigates the computational complexity of reinforcement learning in a novel linear function approximation regime, termed partial $q^π$-realizability. In this framework, the objective is to learn an $ε$-optimal policy with respect to a predefined policy set $Π$, under the assumption that all value functions for policies in $Π$ are linearly realizable. The assumptions of this framework are weaker than those in $q^π$-realizability but stronger than those in $q^*$-realizability, providing a practical model where function approximation naturally arises. We prove that learning an $ε$-optimal policy in this setting is computationally hard. Specifically, we establish NP-hardness under a parameterized greedy policy set (argmax) and show that - unless NP = RP - an exponential lower bound (in feature vector dimension) holds when the policy set contains softmax policies, under the Randomized Exponential Time Hypothesis. Our hardness results mirror those in $q^*$-realizability and suggest computational difficulty persists even when $Π$ is expanded beyond the optimal policy. To establish this, we reduce from two complexity problems, $δ$-Max-3SAT and $δ$-Max-3SAT(b), to instances of GLinear-$κ$-RL (greedy policy) and SLinear-$κ$-RL (softmax policy). Our findings indicate that positive computational results are generally unattainable in partial $q^π$-realizability, in contrast to $q^π$-realizability under a generative access model.
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Research Report > Experimental Study (0.68)
- Research Report > Strength High (0.46)