A Theoretical Analysis

Neural Information Processing Systems 

In this section, we provide detailed theoretical analysis and proofs in linear MDPs [23]. A.1 LSVI Solution In linear MDPs, we assume that the transition dynamics and reward function take the form of P Theorem (Theorem 1 restate) . In experiments, we do not use explicit constraints (e.g., Spectral regularization) for the upper bound Corollary (Corollary 1 restate) . I given in Corollary 1. To conclude, we obtain from Eq. (22) that |T V First, we give the following lemma.