Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning

Dec-31-2007–Neural Information Processing Systems

We present a learning algorithm for undiscounted reinforcement learning. Our interest lies in bounds for the algorithm's online performance after some finite number of steps. In the spirit of similar methods already successfully applied for the exploration-exploitation tradeoff in multi-armed bandit problems, we use upper confidence bounds to show that our UCRL algorithm achieves logarithmic online regret in the number of steps taken with respect to an optimal policy.

data mining, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Dec-31-2007

Conferences PDF

Add feedback

Country:
- Europe > Austria (0.14)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.90)
  - Artificial Intelligence > Machine Learning
    - Reinforcement Learning (0.87)

Duplicate Docs Excel Report

Title
Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning
Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning

Similar Docs Excel Report more

Title	Similarity	Source
None found