Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

Ilai Bistritz, Zhengyuan Zhou, Xi Chen, Nicholas Bambos, Jose Blanchet

Feb-13-2026, 14:15:39 GMT–Neural Information Processing Systems

Consider a player that in each of T rounds chooses one of K arms. An adversary chooses the cost of each arm in a bounded interval, and a sequence of feedback delays {dt} that are unknown to the player. After picking arm at at round t, the player receives the cost of playing this arm dt rounds later. In cases where t + dt > T, this feedback is simply missing.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Feb-13-2026, 14:15:39 GMT

Conferences PDF

Add feedback

Country:
- Europe > France (0.04)
- North America
  - United States > California
    - Santa Clara County > Palo Alto (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.94)

Duplicate Docs Excel Report

Title
Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

Similar Docs Excel Report more

Title	Similarity	Source
None found