Reviews: Online EXP3 Learning in Adversarial Bandits with Delayed Feedback

Jan-26-2025, 10:43:04 GMT–Neural Information Processing Systems

However I have a major issue with their proposed algorithms which seems erroneous due to the choice of the learning rate (\eta) which requires the knowledge of the delays (d_t) -- but according to the problem formulation this is unknown to the learner. Then the whole technique seems to be pointless! The optimal learning rate seem to depend on the delays (d_t), e.g. Thm 1, Line-146 etc., but those are unknown to the learner. The claims of the paper stands vacuous if the proposed technique requires the knowledge of delays, where lies the major challenge of the problem addressed.

adversarial bandit, delayed feedback, online exp3 learning, (5 more...)

Neural Information Processing Systems

Jan-26-2025, 10:43:04 GMT

Conferences Web Page

Add feedback

Genre:
- Instructional Material > Online (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.86)