Bounded Optimal Exploration in MDP

Kawaguchi, Kenji (Massachusetts Institute of Technology)

Apr-19-2016–AAAI Conferences

Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much theoretical work has focused on methods to attain near optimality after a relatively long period of learning and exploration. However, practical concerns require the attainment of satisfactory behavior within a short period of time. In this paper, we relax the PAC-MDP conditions to reconcile theoretically driven exploration methods and practical needs. We propose simple algorithms for discrete and continuous state spaces, and illustrate the benefits of our proposed relaxation via theoretical analyses and numerical examples. Our algorithms also maintain anytime error bounds and average loss bounds. Our approach accommodates both Bayesian and non-Bayesian methods.

algorithm, bayesian inference, immunology, (19 more...)

AAAI Conferences

Apr-19-2016

Conferences PDF

Add feedback

Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Industry:
- Health & Medicine > Therapeutic Area > Immunology (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.66)
    - Reinforcement Learning (0.95)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found