Regret Bounds for Adversarial Contextual Bandits with General Function Approximation and Delayed Feedback

Jun-15-2026, 23:29:56 GMT–Neural Information Processing Systems

We present regret minimization algorithms for the contextual multi-armed bandit (CMAB) problem over K actions in the presence of delayed feedback, a scenario where loss observations arrive with delays chosen by an adversary. As a preliminary result, assuming direct access to a finite policy class Π we establish an optimal expected regret bound of O( p KT log|Π|+ p Dlog|Π|) where D is the sum of delays. For our main contribution, we study the general function approximation setting over a (possibly infinite) contextual loss function class F with access to an online least-square regression oracle O over F. In this setting, we achieve an expected regret bound of O( p KTRT(O) + dmaxDβ) assuming FIFO order, where dmax is the maximal delay, RT(O) is an upper bound on the oracle's regret and β is a stability parameter associated with the oracle. We complement this general result by presenting a novel stability analysis of a Hedge-based version of Vovk's aggregating forecaster as an oracle implementation for least-square regression over a finite function class F and show that its stability parameter β is bounded by log|F|, resulting in an expected regret bound of O( p KT log|F|+ p dmaxDlog|F|) which is a dmax factor away from the lower bound of Ω( p KT log|F|+ p Dlog|F|)that we also present.

artificial intelligence, fuzzy logic, machine learning, (16 more...)

Neural Information Processing Systems

Jun-15-2026, 23:29:56 GMT

Conferences PDF

Add feedback

Country:
- Europe > Austria (0.28)
- North America
  - United States (0.46)
  - Canada (0.28)

Genre:
- Research Report > Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.62)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found