Appendix A Approximation Error Analysis In this section, we provide a complete proof of Theorem 1, quantifying the effect of function embedding of constraints in dual
–Neural Information Processing Systems
The proof is an adaptation from the standard LP for state-value functions to the case of Q -LP ( De Farias and V an Roy, 2003). The effect of full-rank basis embedding in the example in Section 3.1 can be justified straightforwardly. The algorithm can be generalized to undiscounted MDPs with =1 and finite-horizon MDPs. A similar argument of Section 3.3 for discounted MDPs can be applied to MDPs are strictly more general than multi-armed and contextual bandits. Karampatziakis et al. ( 2019) considers The estimator in Karampatziakis et al. ( 2019) is derived from empirical likelihood with reverse Computationally, the estimator in Karampatziakis et al. ( 2019) requires an extra statistics, i.e., ( max Unfortunately the reverse KL-divergence does not satisfy the conditions in Assumption 1 .
Neural Information Processing Systems
Oct-3-2025, 03:57:38 GMT