Thompson Sampling on Symmetric $\alpha$-Stable Bandits

Dubey, Abhimanyu, Pentland, Alex

arXiv.org Machine Learning 

Rigorous empirical evidence in favor of TS demonstrated by Chapelle and Li [2011] sparked new interest in the theoretical Thompson Sampling provides an efficient technique analysis of the algorithm, and the seminal work to introduce prior knowledge in the multiarmed of Agrawal and Goyal [2012, 2013]; Russo and Van Roy bandit problem, along with providing remarkable [2014] demonstrated the optimality of TS when rewards are empirical performance. In this paper, bounded in [0, 1] or are Gaussian. These results were extended we revisit the Thompson Sampling algorithm under in the work of Korda et al. [2013] to more general, rewards drawn from symmetric α-stable distributions, exponential family reward distributions. The empirical studies, which are a class of heavy-tailed probability along with theoretical guarantees, have established TS as distributions utilized in finance and economics, a powerful algorithm for the MAB problem. in problems such as modeling stock prices and However, when designing decision-making algorithms for human behavior. We present an efficient framework complex systems, we see that interactions in such systems often for posterior inference, which leads to two lead to heavy-tailed and power law distributions, such as algorithms for Thompson Sampling in this setting.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found