171ae1bbb81475eb96287dd78565b38b-AuthorFeedback.pdf

Neural Information Processing Systems 

We7 observe empirically that doubling ofnrequires doubling ofm, to get policies of a similar quality. Feedback 2: Theorem 4isan instance-dependent upper bound on then-round regret ofSoftElim.