Leveraging User-Triggered Supervision in Contextual Bandits

Agarwal, Alekh, Gentile, Claudio, Marinov, Teodor V.

arXiv.org Artificial Intelligence 

How should we leverage such an extra modality of feedback along with the typical reward signal in CBs? We study contextual bandit (CB) problems, While prior works have developed hybrid models such as where the user can sometimes respond with the learning with feedback graphs (e.g., (Mannor & Shamir, best action in a given context. Such an interaction 2011; Caron et al., 2012; Alon et al., 2017)) to capture a arises, for example, in text prediction or autocompletion continuum between supervised and CB learning, such settings settings, where a poor suggestion is simply are not a natural fit here. A key challenge in the ignored and the user enters the desired text feedback structure is that the extra supervised signal is only instead. Crucially, this extra feedback is usertriggered available on a subset of the contexts, which are chosen by on only a subset of the contexts. We develop the user as some unknown function of the algorithm's recommended a new framework to leverage such signals,

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found