Online Preselection with Context Information under the Plackett-Luce Model

Mesaoudi-Paul, Adil El, Bengs, Viktor, Hüllermeier, Eyke

arXiv.org Machine Learning 

In machine learning, the notion of multi-armed bandits (MAB) refers to a class of online learning problems, in which a learner is supposed to simultaneously explore and exploit a given set of choice alternatives (metaphorically referred to as "arms") in the course of a sequential decision process (Lattimore and Szepesvári, 2019). In this paper, we consider an extension of the basic setting, which is practically motivated by the problem of preselection as recently introduced by Saha and Gopalan (2018b) and Bengs and Hüllermeier (2019): Instead of selecting a single arm, the learner is only supposed to preselect a promising subset of arms. The final choice is then made by a selector, for example a human user or another algorithm. In information retrieval, for instance, the role of the learner is played by a search engine, and the selector is the user who seeks a certain information. Another application, which served as a concrete motivation of our setting and will also be used in our experimental study, is the problem of algorithm (pre-)selection (Kerschke et al., 2018).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found