Goto

Collaborating Authors

 hxt


SupplementaryMaterial

Neural Information Processing Systems

Proof of Proposition 2. If s = 0, the result is trivial. Hence, using the alternative formulationµs(X) = lnkeXks, we get that s 7 µs(X) is nondecreasing, andlims + µs(X) = ln(esssup(eX)) = esssupX. By definition of IX, φs0(X) and φs1(X) are integrable. The result follows from standard analysis of non-convex gradient descent. Hence, f(x) is inferior to the sum of both terms.


Stage-wiseConservativeLinearBandits

Neural Information Processing Systems

Forinstance,comparedto existing solutions, we showthat SCLTS plays the (non-optimal) baseline action at most O(logT) times (compared toO( T)). Finally, we make connections to another studied form of "safety constraints" that takes the form of anupper bound on the instantaneous reward.