Leveraging User-Triggered Supervision in Contextual Bandits

Agarwal, Alekh, Gentile, Claudio, Marinov, Teodor V.

Feb-7-2023–arXiv.org Artificial Intelligence

How should we leverage such an extra modality of feedback along with the typical reward signal in CBs? We study contextual bandit (CB) problems, While prior works have developed hybrid models such as where the user can sometimes respond with the learning with feedback graphs (e.g., (Mannor & Shamir, best action in a given context. Such an interaction 2011; Caron et al., 2012; Alon et al., 2017)) to capture a arises, for example, in text prediction or autocompletion continuum between supervised and CB learning, such settings settings, where a poor suggestion is simply are not a natural fit here. A key challenge in the ignored and the user enters the desired text feedback structure is that the extra supervised signal is only instead. Crucially, this extra feedback is usertriggered available on a subset of the contexts, which are chosen by on only a subset of the contexts. We develop the user as some unknown function of the algorithm's recommended a new framework to leverage such signals,

artificial intelligence, leveraging user-triggered supervision, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Feb-7-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning > Optimization (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found