Q0(X))withthesecond moment regularizer while O(q
–Neural Information Processing Systems
"the learner knows the logging policy": This isafairly standard assumption inprior work [1,5,19,22] and holds for10 exampleinonlineadvertisement.
Neural Information Processing Systems
Feb-11-2026, 11:55:58 GMT