Stage-wiseConservativeLinearBandits

Neural Information Processing Systems 

Forinstance,comparedto existing solutions, we showthat SCLTS plays the (non-optimal) baseline action at most O(logT) times (compared toO( T)). Finally, we make connections to another studied form of "safety constraints" that takes the form of anupper bound on the instantaneous reward.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found