bc6d753857fe3dd4275dff707dedf329-Supplemental.pdf

Neural Information Processing Systems 

In this setting, unlike basic setting, objective and constraints are not linear. We focus on a single state-action pairs,a, stage h, and objectivem. Similarly, in constrained settings, its estimated resource consumptions are underestimates of the true resource consumptions. B.5 BoundingtheBellmanerror We now provide an upper bound on the Bellman error which arises in the RHS of the regret decomposition(Proposition3.3). When neither failure events occur (probability 1 2δ), Proposition 3.3 upper bounds either of reward or consumption regret by In this section, we prove the main guarantee for the convex-concave setting.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found