A Simpler Alternative to Variational Regularized Counterfactual Risk Minimization

Bakker, Hua Chang, Gupta, Shashank, Oosterhuis, Harrie

Sep-15-2024–arXiv.org Artificial Intelligence

Variance regularized counterfactual risk minimization (VRCRM) has been proposed as an alternative off-policy learning (OPL) method. VRCRM method uses a lower-bound on the $f$-divergence between the logging policy and the target policy as regularization during learning and was shown to improve performance over existing OPL alternatives on multi-label classification tasks. In this work, we revisit the original experimental setting of VRCRM and propose to minimize the $f$-divergence directly, instead of optimizing for the lower bound using a $f$-GAN approach. Surprisingly, we were unable to reproduce the results reported in the original setting. In response, we propose a novel simpler alternative to f-divergence optimization by minimizing a direct approximation of f-divergence directly, instead of a $f$-GAN based lower bound. Experiments showed that minimizing the divergence using $f$-GANs did not work as expected, whereas our proposed novel simpler alternative works better empirically.

divergence, minimization, shashank gupta, (12 more...)

arXiv.org Artificial Intelligence

Sep-15-2024

arXiv.org PDF

Add feedback

Country:
- Europe
  - Finland (0.04)
  - Netherlands
    - North Holland > Amsterdam (0.06)
    - Gelderland > Nijmegen (0.04)

Genre:
- Research Report > New Finding (0.49)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found