Neural Contextual Bandits Under Delayed Feedback Constraints
Moghimi, Mohammadali, Jose, Sharu Theresa, Moothedath, Shana
–arXiv.org Artificial Intelligence
-- This paper presents a new algorithm for neural contextual bandits (CBs) that addresses the challenge of delayed reward feedback, where the reward for a chosen action is revealed after a random, unknown delay. This scenario is common in applications such as online recommendation systems and clinical trials, where reward feedback is delayed because the outcomes or results of a user's actions (such as recommendations or treatment responses) take time to manifest and be measured. The proposed algorithm, called Delayed Neu-ralUCB, uses upper confidence bound (UCB)-based exploration strategy. We further consider a variant of the algorithm, called Delayed NeuralTS, that uses Thompson Sampling based exploration. Numerical experiments on real-world datasets, such as MNIST and Mushroom, along with comparisons to benchmark approaches, demonstrate that the proposed algorithms effectively manage varying delays and are well-suited for complex real-world scenarios. The stochastic contextual bandit (CB) problem has gained immense interest in recent years due to its application in various domains, including healthcare, finance, and recom-mender systems [1]-[5]. The CB is a sequential decision-making problem where, in each round, the agent (or decision-maker) is presented with K actions and associated contextual information.
arXiv.org Artificial Intelligence
Apr-17-2025
- Country:
- North America > United States (0.28)
- Genre:
- Research Report > New Finding (0.34)
- Industry:
- Technology: