9f9e8cba3700df6a947a8cf91035ab84-AuthorFeedback.pdf
–Neural Information Processing Systems
Even if it is used, to guarantee approximation ofQ-function byφTθ, as stated in our5 Theorem1,werequirethattheoptimalθ iswithintheprojectionradius. Now consider an environment for which there exists a policy that37 can map anystate toanystate with nonzero probability (i.e., irreducibility holds) and can get back tothe same state38 aperiodically. It is of great interest to further explore more advanced function spaces, such as deep neural52 networks.
Neural Information Processing Systems
Feb-13-2026, 06:59:45 GMT