Online Preselection with Context Information under the Plackett-Luce Model
Mesaoudi-Paul, Adil El, Bengs, Viktor, Hüllermeier, Eyke
In machine learning, the notion of multi-armed bandits (MAB) refers to a class of online learning problems, in which a learner is supposed to simultaneously explore and exploit a given set of choice alternatives (metaphorically referred to as "arms") in the course of a sequential decision process (Lattimore and Szepesvári, 2019). In this paper, we consider an extension of the basic setting, which is practically motivated by the problem of preselection as recently introduced by Saha and Gopalan (2018b) and Bengs and Hüllermeier (2019): Instead of selecting a single arm, the learner is only supposed to preselect a promising subset of arms. The final choice is then made by a selector, for example a human user or another algorithm. In information retrieval, for instance, the role of the learner is played by a search engine, and the selector is the user who seeks a certain information. Another application, which served as a concrete motivation of our setting and will also be used in our experimental study, is the problem of algorithm (pre-)selection (Kerschke et al., 2018).
Feb-11-2020
- Country:
- Europe > Germany (0.04)
- North America > United States (0.05)
- Genre:
- Research Report > New Finding (0.68)
- Industry:
- Education (0.54)
- Technology: