Goto

Collaborating Authors

 Reinforcement Learning



Provably Efficient Reinforcement Learning with Linear Function Approximation under Adaptivity Constraints

Neural Information Processing Systems

Real-world reinforcement learning (RL) applications often come with possibly infinite state and action space, and in such a situation classical RL algorithms developed in the tabular setting are not applicable anymore. A popular approach to overcoming this issue is by applying function approximation techniques to the underlying structures of the Markov decision processes (MDPs).


Appendix A Proofs

Neural Information Processing Systems

The second derivative test confirms that we have a maximum, i.e. The proof for (b) can be found in the work of Goodfellow et al. In this section we present Adversarial Soft Q-Fitting (ASQF), a principled approach to Imitation Learning without Reinforcement Learning that relies exclusively on transitions. Using transitions rather than trajectories presents several practical benefits such as the possibility to deal with asynchronously collected data or non-sequential experts demonstrations. We consider the GAN objective of Eq. (5) with The beginning of the proof closely follows the proof of Appendix A.1.