0c72cb7ee1512f800abe27823a792d03-Supplemental.pdf
–Neural Information Processing Systems
However, for the recommender system experiment, there are no natural representations for the candidate models. IS-g/DR-g Off-policy evaluation (OPE) methods can provide an estimate of the accumulative metric. The resulting methods aredenoted asIS-EI andDR-EIrespectively. Asthere arelimited information tobegained byrepeatedly deploying thesame model online, we exclude the models that have been deployed when choosing the next model to deploy for all the methodsincludingAOE. We simulate the "online" deployment scenario as follows: a multi-class classifier is given a set of inputs; for each input, the classifier returns a prediction of the label and only a binary immediate feedback about whether the predicted class is correct is available. They-axisshowsthe gap in the accumulativemetric between the optimal model and the estimated best model by each method.
Neural Information Processing Systems
Feb-7-2026, 11:02:56 GMT
- Technology: