Active Policy Iteration: Efficient Exploration through Active Learning for Value Function Approximation in Reinforcement Learning

Akiyama, Takayuki (Tokyo Institute of Technology) | Hachiya, Hirotaka (Tokyo Institute of Technology) | Sugiyama, Masashi (Tokyo Institute of Technology)

Jun-23-2009–AAAI Conferences

Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. We demonstrate the usefulness of the proposed method, named active policy iteration (API), through simulations with a batting robot.

generalization error, immediate reward, iteration, (13 more...)

AAAI Conferences

Jun-23-2009

Conferences PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - Illinois > Cook County
    - Chicago (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (0.71)
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.41)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found