A Survey of Risk-Aware Multi-Armed Bandits
Tan, Vincent Y. F., A., Prashanth L., Jagannathan, Krishna
There are two general does not satisfactorily capture the merits sub-problems in the MAB literature, namely, regret of a drug or a portfolio. In such applications, minimization and best-arm identification (also risk plays a crucial role, and a riskaware called pure exploration). In the former, in which performance measure is preferable, the exploration-exploitation trade-off manifests, the so as to capture losses in the case of adverse player wants to maximize his reward over a fixed events. This survey aims to consolidate time period. In the latter, the player simply wants and summarise the existing research to learn which arm is the best in either the quickest on risk measures, specifically in the context time possible with a given probability of success(the of multi-armed bandits. We review fixed-confidence setting) or he wants to do so with various risk measures of interest, and comment the highest probability of success given a fixed playing on their properties. Next, we review horizon (the fixed budget setting). In most of existing concentration inequalities for various the MAB literature (see Lattimore and Szepesvári risk measures. Then, we proceed to (2020) for an up-to-date survey), the metric of interest defining risk-aware bandit problems, We is defined simply as the mean of the reward consider algorithms for the regret minimization distribution associated with the arm pulled.
May-11-2022