ProvableModel-based NonlinearBanditand ReinforcementLearning: ShelveOptimism,Embrace VirtualCurvature

Open in new window