Bandit and Delayed Feedback in Online Structured Prediction

Jun-11-2026, 06:36:57 GMT–Neural Information Processing Systems

Online structured prediction is a task of sequentially predicting outputs with complex structures based on inputs and past observations, encompassing online classification. Recent studies showed that in the full-information setting, we can achieve finite bounds on the *surrogate regret*, *i.e.,* the extra target loss relative to the best possible surrogate loss. In practice, however, full-information feedback is often unrealistic as it requires immediate access to the whole structure of complex outputs. Motivated by this, we propose algorithms that work with less demanding feedback, *bandit* and *delayed* feedback. For bandit feedback, by using a standard inverse-weighted gradient estimator, we achieve a surrogate regret bound of $O(\sqrt{KT})$ for the time horizon $T$ and the size of the output set $K$. However, $K$ can be extremely large when outputs are highly complex, resulting in an undesirable bound.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Jun-11-2026, 06:36:57 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.59)

Industry:
- Education > Educational Setting > Online (0.59)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)