UCB Algorithm for Exponential Distributions

Jouini, Wassim, Moy, Christophe

arXiv.org Machine Learning 

Several sequential decision making problems face a dilemma between the exploration of a space of choices, or solutions, and the exploitation of the information available to the decision maker. The problem described herein is known as sequential decision making under uncertainty. In this paper we focus on a subclass of this problem, where the decision maker has a discrete set of stateless choices and the added information is a real valued sequence (of feedbacks, or rewards) that quantifies how well the decision maker behaved in the previous time steps. This particular instance of sequential decision making problems is generally known as the multi-armed bandit (MAB) problem [1], [2]. A common approach to solving the exploration versus exploitation dilemma within MAB problems consists in assigning an utility value to every arm.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found