Leveraging priors on distribution functions for multi-arm bandits

Vashishtha, Sumit, Maillard, Odalric-Ambrym

Mar-6-2025–arXiv.org Machine Learning

We introduce Dirichlet Process Posterior Sampling (DPPS), a Bayesian non-parametric algorithm for multi-arm bandits based on Dirichlet Process (DP) priors. Like Thompson-sampling, DPPS is a probability-matching algorithm, i.e., it plays an arm based on its posterior-probability of being optimal. Instead of assuming a parametric class for the reward generating distribution of each arm, and then putting a prior on the parameters, in DPPS the reward generating distribution is directly modeled using DP priors. DPPS provides a principled approach to incorporate prior belief about the bandit environment, and in the noninformative limit of the DP posteriors (i.e. Bayesian Bootstrap), we recover Non Parametric Thompson Sampling (NPTS), a popular non-parametric bandit algorithm, as a special case of DPPS. We employ stick-breaking representation of the DP priors, and show excellent empirical performance of DPPS in challenging synthetic and real world bandit environments. Finally, using an information-theoretic analysis, we show non-asymptotic optimality of DPPS in the Bayesian regret setup.

algorithm, bandit environment, posterior, (16 more...)

arXiv.org Machine Learning

Mar-6-2025

arXiv.org PDF

Add feedback

Country:
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - France
    - Hauts-de-France > Nord
      - Lille (0.04)
    - Grand Est > Meurthe-et-Moselle
      - Nancy (0.04)
- Asia > Japan
  - Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre:
- Research Report (1.00)

Industry:
- Food & Agriculture > Agriculture (1.00)
- Education (0.67)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.67)
  - Artificial Intelligence
    - Representation & Reasoning > Uncertainty
      - Bayesian Inference (1.00)
    - Machine Learning
      - Statistical Learning (0.89)
      - Learning Graphical Models > Directed Networks
        Bayesian Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found