Stage-wiseConservativeLinearBandits
–Neural Information Processing Systems
Forinstance,comparedto existing solutions, we showthat SCLTS plays the (non-optimal) baseline action at most O(logT) times (compared toO( T)). Finally, we make connections to another studied form of "safety constraints" that takes the form of anupper bound on the instantaneous reward.
Neural Information Processing Systems
Feb-9-2026, 03:45:33 GMT
- Country:
- Technology: