Reviews: Preference Based Adaptation for Learning Objectives
–Neural Information Processing Systems
Summary: The authors consider the problem of optimizing the linear combination of multiple objective functions, where these objective functions are typically surrogate loss functions for machine learning tasks. In the problem setting, the decision maker explore-while-exploit the linear combination in a duel bandit setting, where in each time step the decision maker tests the two hypotheses generated from two linear combinations, and then the decision maker would receive the feedback on whether the first hypothesis is better or the second is better. The main contributions of the paper is the proposal of online algorithms for the duel bandit problem, where the preference on two tested hypotheses is modeled by a binary logistic choice model. In order to avoid retraining the hypothesis for every different linear combination, the authors adapt the boosting algorithm, which focuses on optimizing the mixture of K different hypotheses, where each hypothesis stem from optimizing one surrogate function. Major Comment: I find the paper quite interesting in terms of problem model and the analysis, and I am more inclined towards acceptance than rejection.
Neural Information Processing Systems
Oct-8-2024, 05:22:14 GMT