kumar
26657d5ff9020d2abefe558796b99584-Paper.pdf
Specifically, there now exists a tight relaxation for verifying therobustness ofaneural networkto` input perturbations, aswell asefficient primal and dual solvers for the relaxation. Buoyed by this success, we consider the problem of developing similar techniques for verifying robustness to input perturbations within the probability simplex. We prove a somewhat surprising result that,inthiscase, notonlycanonedesign atightrelaxation thatovercomes the convexbarrier,butthe size ofthe relaxation remains linear inthe number of neurons, thereby leading tosimpler and more efficient algorithms.
OfflineReinforcementLearningasOneBig SequenceModelingProblem
Reinforcement learning (RL) is typically concerned with estimating stationary policies orsingle-step models, leveraging theMarkovproperty tofactorize problems in time. However, we can also view RL as a generic sequence modeling problem, with the goal being to produce a sequence of actions that leads to a sequence ofhighrewards.