Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

Yu, Jin, Aberdeen, Douglas, Schraudolph, Nicol N.

Dec-31-2006–Neural Information Processing Systems

Reinforcement learning by direct policy gradient estimation is attractive in theory but in practice leads to notoriously ill-behaved optimization problems. We improve its robustness and speed of convergence with stochastic meta-descent, a gain vector adaptation method that employs fast Hessian-vector products. In our experiments the resulting algorithms outperform previously employed online stochastic, offline conjugate, and natural policy gradient methods.

algorithm, gradient, omdp, (13 more...)

Neural Information Processing Systems

Dec-31-2006

Conferences PDF

Add feedback

Country:
- Oceania > Australia
  - Australian Capital Territory > Canberra (0.04)
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Florida > Monroe County
    - Key West (0.04)
- Europe > United Kingdom
  - Scotland > City of Edinburgh
    - Edinburgh (0.04)
  - England > Cambridgeshire
    - Cambridge (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.90)
  - Machine Learning
    - Statistical Learning (0.96)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.32)

Duplicate Docs Excel Report

Title
Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation
Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

Similar Docs Excel Report more

Title	Similarity	Source
None found