Structure of the supplementary material

Neural Information Processing Systems 

Appendix B provides the proofs for the results of the basic setting presented in Section 3. Appendix C provides the proofs and additional discussion for the results of the concave-convex setting presented in Section 4. Appendix F provides auxiliary concentration lemmas useful for the derivation of our results. RL, is presented at Algorithm 1. In this setting, unlike basic setting, objective and constraints are not linear. Similar to before, expressing this program based on occupation measures provides a convex program. We define the bonus-enhanced cMDP, i.e.