Squeeze All: Novel Estimator and Self-Normalized Bound for Linear Contextual Bandits

Kim, Wonyoung, Paik, Myunghee Cho, Oh, Min-hwan

Mar-28-2023–arXiv.org Artificial Intelligence

We propose a linear contextual bandit algorithm with $O(\sqrt{dT\log T})$ regret bound, where $d$ is the dimension of contexts and $T$ isthe time horizon. Our proposed algorithm is equipped with a novel estimator in which exploration is embedded through explicit randomization. Depending on the randomization, our proposed estimator takes contributions either from contexts of all arms or from selected contexts. We establish a self-normalized bound for our estimator, which allows a novel decomposition of the cumulative regret into \textit{additive} dimension-dependent terms instead of multiplicative terms. We also prove a novel lower bound of $\Omega(\sqrt{dT})$ under our problem setting. Hence, the regret of our proposed algorithm matches the lower bound up to logarithmic factors. The numerical experiments support the theoretical guarantees and show that our proposed method outperforms the existing linear bandit algorithms.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Mar-28-2023

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)
- North America > United States
  - New York (0.28)

Genre:
- Research Report (0.82)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found