Neural Contextual Bandits Under Delayed Feedback Constraints

Moghimi, Mohammadali, Jose, Sharu Theresa, Moothedath, Shana

arXiv.org Artificial Intelligence 

-- This paper presents a new algorithm for neural contextual bandits (CBs) that addresses the challenge of delayed reward feedback, where the reward for a chosen action is revealed after a random, unknown delay. This scenario is common in applications such as online recommendation systems and clinical trials, where reward feedback is delayed because the outcomes or results of a user's actions (such as recommendations or treatment responses) take time to manifest and be measured. The proposed algorithm, called Delayed Neu-ralUCB, uses upper confidence bound (UCB)-based exploration strategy. We further consider a variant of the algorithm, called Delayed NeuralTS, that uses Thompson Sampling based exploration. Numerical experiments on real-world datasets, such as MNIST and Mushroom, along with comparisons to benchmark approaches, demonstrate that the proposed algorithms effectively manage varying delays and are well-suited for complex real-world scenarios. The stochastic contextual bandit (CB) problem has gained immense interest in recent years due to its application in various domains, including healthcare, finance, and recom-mender systems [1]-[5]. The CB is a sequential decision-making problem where, in each round, the agent (or decision-maker) is presented with K actions and associated contextual information.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found