Neural Contextual Bandits Under Delayed Feedback Constraints

Moghimi, Mohammadali, Jose, Sharu Theresa, Moothedath, Shana

Apr-17-2025–arXiv.org Artificial Intelligence

-- This paper presents a new algorithm for neural contextual bandits (CBs) that addresses the challenge of delayed reward feedback, where the reward for a chosen action is revealed after a random, unknown delay. This scenario is common in applications such as online recommendation systems and clinical trials, where reward feedback is delayed because the outcomes or results of a user's actions (such as recommendations or treatment responses) take time to manifest and be measured. The proposed algorithm, called Delayed Neu-ralUCB, uses upper confidence bound (UCB)-based exploration strategy. We further consider a variant of the algorithm, called Delayed NeuralTS, that uses Thompson Sampling based exploration. Numerical experiments on real-world datasets, such as MNIST and Mushroom, along with comparisons to benchmark approaches, demonstrate that the proposed algorithms effectively manage varying delays and are well-suited for complex real-world scenarios. The stochastic contextual bandit (CB) problem has gained immense interest in recent years due to its application in various domains, including healthcare, finance, and recom-mender systems [1]-[5]. The CB is a sequential decision-making problem where, in each round, the agent (or decision-maker) is presented with K actions and associated contextual information.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

Apr-17-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.56)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.95)
  - Reinforcement Learning (0.76)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found