Appendix B, we provide sufficient conditions for Assumption 1 that were mentioned in the main

Neural Information Processing Systems 

In Appendix A we introduce some basic definitions that are needed for our theoretical results. In Appendix C and Appendix D we prove the error bounds for PPI and PQI. All the other dynamics are preserved. Rewards are 0 for the absorbing action and unchanged elsewhere. Algorithm 1 and 2. As some of the notations is actually a function of the MDP, we clarify the usage Recall the definition of semi-norm of any function of state-action pairs.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found