Kalman Temporal Differences

Oct-29-2010–Journal of Artificial Intelligence Research

Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.

algorithm, equation, value function, (13 more...)

Journal of Artificial Intelligence Research

Oct-29-2010

Journals PDF

Add feedback

Country:
- North America
  - United States
    - Tennessee > Davidson County
      - Nashville (0.04)
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
    - Oregon > Multnomah County
      - Portland (0.04)
    - New York > New York County
      - New York City (0.04)
    - California > San Francisco County
      - San Francisco (0.14)
  - Canada
    - Quebec > Montreal (0.04)
    - Ontario > Hamilton (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.14)
- Europe
  - Italy > Sardinia (0.04)
  - Greece (0.04)
  - United Kingdom > England
    - Oxfordshire > Oxford (0.04)
  - Russia > Central Federal District
    - Moscow Oblast > Moscow (0.04)
  - France > Hauts-de-France
    - Nord > Lille (0.04)
  - Finland > Lapland
    - Kittilä (0.04)
- Asia
  - Russia (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)

Genre:
- Research Report (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.48)