Epsilon-Greedy Thompson Sampling to Bayesian Optimization

Mar-1-2024–arXiv.org Machine Learning

Thompson sampling (TS) serves as a solution for addressing the exploitation-exploration dilemma in Bayesian optimization (BO). While it prioritizes exploration by randomly generating and maximizing sample paths of Gaussian process (GP) posteriors, TS weakly manages its exploitation by gathering information about the true objective function after each exploration is performed. In this study, we incorporate the epsilon-greedy ($\varepsilon$-greedy) policy, a well-established selection strategy in reinforcement learning, into TS to improve its exploitation. We first delineate two extremes of TS applied for BO, namely the generic TS and a sample-average TS. The former and latter promote exploration and exploitation, respectively. We then use $\varepsilon$-greedy policy to randomly switch between the two extremes. A small value of $\varepsilon \in (0,1)$ prioritizes exploitation, and vice versa. We empirically show that $\varepsilon$-greedy TS with an appropriate $\varepsilon$ is better than one of its two extremes and competes with the other.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Machine Learning

Mar-1-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Texas > Harris County > Houston (0.14)
- Europe > United Kingdom
  - England (0.14)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Energy > Oil & Gas > Upstream (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Representation & Reasoning > Optimization (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found