Bandit-Based Planning and Learning in Continuous-Action Markov Decision Processes

Weinstein, Ari (Rutgers University) | Littman, Michael L. (Rutgers University)

Jun-8-2012–AAAI Conferences

Recent research leverages results from the continuous-armed bandit literature to create a reinforcement-learning algorithm for continuous state and action spaces. Initially proposed in a theoretical setting, we provide the first examination of the empirical properties of the algorithm. Through experimentation, we demonstrate the effectiveness of this planning method when coupled with exploration and model learning and show that, in addition to its formal guarantees, the approach is very competitive with other continuous-action reinforcement learners.

algorithm, optimization problem, planning & scheduling, (19 more...)

AAAI Conferences

Jun-8-2012

Conferences PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report (0.46)
- Workflow (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.41)
    - Reinforcement Learning (1.00)
  - Representation & Reasoning > Optimization (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found