On-line Policy Improvement using Monte-Carlo Search

Dec-31-1997–Neural Information Processing Systems

Policy iteration is known to have rapid and robust convergence properties, and for Markov tasks with lookup-table state-space representations, it is guaranteed to convergence to the optimal policy. Online Policy Improvement using Monte-Carlo Search 1069 In typical uses of policy iteration, the policy improvement step is an extensive off-line procedure. For example, in dynamic programming, one performs a sweep through all states in the state space. Reinforcement learning provides another approach topolicy improvement; recently, several authors have investigated using RL in conjunction with nonlinear function approximators to represent the value functions and/orpolicies (Tesauro, 1992; Crites and Barto, 1996; Zhang and Dietterich, 1996). These studies are based on following actual state-space trajectories rather than sweeps through the full state space, but are still too slow to compute improved policies in real time.

algorithm, backgammon, planning & scheduling, (22 more...)

Neural Information Processing Systems

Dec-31-1997

Conferences PDF

Add feedback

Country:
- North America > United States > Massachusetts > Middlesex County (0.14)

Industry:
- Leisure & Entertainment > Games > Backgammon (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
On-line Policy Improvement using Monte-Carlo Search
On-line Policy Improvement using Monte-Carlo Search

Similar Docs Excel Report more

Title	Similarity	Source
None found