Goto

Collaborating Authors

 bellman error









bc6d753857fe3dd4275dff707dedf329-Supplemental.pdf

Neural Information Processing Systems

In this setting, unlike basic setting, objective and constraints are not linear. We focus on a single state-action pairs,a, stage h, and objectivem. Similarly, in constrained settings, its estimated resource consumptions are underestimates of the true resource consumptions. B.5 BoundingtheBellmanerror We now provide an upper bound on the Bellman error which arises in the RHS of the regret decomposition(Proposition3.3). When neither failure events occur (probability 1 2δ), Proposition 3.3 upper bounds either of reward or consumption regret by In this section, we prove the main guarantee for the convex-concave setting.