UCB Algorithm for Exponential Distributions

Apr-7-2012–arXiv.org Machine Learning

Several sequential decision making problems face a dilemma between the exploration of a space of choices, or solutions, and the exploitation of the information available to the decision maker. The problem described herein is known as sequential decision making under uncertainty. In this paper we focus on a subclass of this problem, where the decision maker has a discrete set of stateless choices and the added information is a real valued sequence (of feedbacks, or rewards) that quantifies how well the decision maker behaved in the previous time steps. This particular instance of sequential decision making problems is generally known as the multi-armed bandit (MAB) problem [1], [2]. A common approach to solving the exploration versus exploitation dilemma within MAB problems consists in assigning an utility value to every arm.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

Apr-7-2012

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.41)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Data Science > Data Mining
    - Big Data (0.50)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found