Selective Reviews of Bandit Problems in AI via a Statistical View

Zhou, Pengjie, Wei, Haoyu, Zhang, Huiming

arXiv.org Machine Learning 

Introduction Reinforcement Learning (RL) is one of the most prominent and widely discussed methods in artificial intelligence, primarily focusing on how an agent learns to make decisions by interacting with an environment to maximize cumulative rewards [1]. RL has seen extensive applications in various domains, including autonomous driving [2], recommendation systems [3], unmanned aerial vehicles (UAVs) [4], financial trading [5], causal inference [6], and precision medicine [7,8]; see [9,10] for a review. The classic and simplified problem in RL is the stochastic bandit problems. Stochastic bandit problems exemplify the exploration-exploitation tradeoff dilemma, where an agent must choose between exploring new options to gather more information and exploiting known options to maximize rewards. The current review literature on stochastic bandit algorithms highlights applications in areas such as recommendation systems[11-13], experimental design[14], and precision medicine[8], causal inference[15]. Efficient bandit algorithms are designed from a statistical perspective. However, these aspects remain underexplored in existing reviews. This paper aims to address this gap by focusing on the probabilistic and statistical foundations of stochastic algorithms, with particular emphasis on concentration inequalities, minimax rate of regret upper bounds, small-sample statistical inferences, linear models, Bayesian optimization, statistical learning theory, design of experiments, the Neyman-Rubin causal model, functional data analysis, robust statistics, information theory, and so on.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found