An Adversarial Analysis of Thompson Sampling for Full-information Online Learning: from Finite to Infinite Action Spaces

Feb-24-2025–arXiv.org Machine Learning

We develop an analysis of Thompson sampling for online learning under full feedback - also known as prediction with expert advice - where the learner's prior is defined over the space of an adversary's future actions, rather than the space of experts. We show regret decomposes into regret the learner expected a priori, plus a prior-robustness-type term we call excess regret. In the classical finite-expert setting, this recovers optimal rates. As an initial step towards practical online learning in settings with a potentially-uncountably-infinite number of experts, we show that Thompson sampling with a certain Gaussian process prior widely-used in the Bayesian optimization literature has a $\mathcal{O}(\beta\sqrt{T\log(1+\lambda)})$ rate against a $\beta$-bounded $\lambda$-Lipschitz adversary.

adversary, algorithm, thompson, (15 more...)

arXiv.org Machine Learning

Feb-24-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > San Francisco County > San Francisco (0.14)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Germany > Lower Saxony
    - Gottingen (0.04)

Genre:
- Research Report (0.50)

Industry:
- Education > Educational Setting > Online (0.82)

Technology:
- Information Technology
  - Game Theory (1.00)
  - Data Science (0.67)
  - Enterprise Applications > Human Resources
    - Learning Management (0.82)
  - Artificial Intelligence
    - Representation & Reasoning > Uncertainty
      - Bayesian Inference (0.46)
    - Machine Learning > Learning Graphical Models
      - Directed Networks > Bayesian Learning (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found