Computational Hardness of Reinforcement Learning with Partial qπ-Realizability

Neural Information Processing Systems 

This paper investigates the computational complexity of reinforcement learning within a novel linear function approximation regime, termed partial qπ-realizability. In this framework, the objective is to learn an ϵ-optimal policy with respect to a predefined policy set Π, under the assumption that all value functions corresponding to policies in Π are linearly realizable. This framework adopts assumptions that are weaker than those in the qπ-realizability setting yet stronger than those in the q -realizability setup. As a result, it provides a more practical model for reinforcement learning scenarios where function approximation naturally arise. We prove that learning an ϵ-optimal policy in this newly defined setting is computationally hard. More specifically, we establish NP-hardness under a parameterized greedy policy set (i.e., argmax) and, further, show that--unless NP = RP--an exponential lower bound (exponential in feature vector dimension) holds when the policy set contains softmax policies, under the Randomized Exponential Time Hypothesis. Our hardness results mirror those obtained in the q -realizability settings, and suggest that computational difficulty persists even when the policy class Πis expanded beyond the optimal policy, reinforcing the unbreakable nature of the computational hardness result regarding partial qπ-realizability under two important policy sets. To establish our negative result, our primary technical contribution is a reduction from two complexity problems, δ-MAX-3SAT and δ-MAX-3SAT(b), to instances of our problem settings: GLINEAR-κ-RL (under the greedy policy set) and SLINEAR-κ-RL (under the softmax policy set), respectively. Our findings indicate that positive computational results are generally unattainable in the context of partial qπ-realizability, in sharp contrast to the qπ-realizability setting under a generative access model.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found