Finite Sample Analysis of LSTD with Random Projections and Eligibility Traces

Li, Haifang, Xia, Yingce, Zhang, Wensheng

May-25-2018–arXiv.org Artificial Intelligence

Policy evaluation, commonly referred to as value function approximation, is an important and central part in many reinforcement learning (RL) algorithms [27], whose task is to estimate value functions for a fixed policy in a discounted Markov Decision Process (MDP) environment. The value function of each state specifies the accumulated reward an agent would receive in the future by following the fixed policy from that state. Value functions have been widely investigated in RL applications, and it can provide insightful and important information for the agent to obtain an optimal policy, such as important board configurations in Go [24], failure probabilities of large telecommunication networks [9], taxi-out times at large airports [2] and so on. Despite the value functions can be approximated by different ways, the simplest form, linear approximations, are still widely adopted and studied due to their good generalization abilities, relatively efficient computation and solid theoretical guarantees[27, 7, 13, 16]. Temporal Difference (TD) learning is a common approach to this policy evaluation with linear function approximation problem[27]. These typical TD algorithms can be divided into two categories: gradient based methods (e.g., GTD(λ) [28]) and least-square (LS) based methods (e.g., LSTD(λ)[4]). A good survey on these algorithms can be found in [17, 6, 12, 7, 13]. 1 As the development of information technologies, high-dimensional data is widely seen in RL applications [26, 30, 23], which brings serious challenges to design scalable and computationally efficient algorithms for the linear value function approximation problem. To address this practical issue, several approaches have been developed for efficient and effective value function approximation.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

May-25-2018

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Alberta (0.14)
  - United States > Florida (0.04)
- Europe > Germany
  - Hesse > Darmstadt Region > Darmstadt (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China
    - Beijing > Beijing (0.04)
    - Anhui Province > Hefei (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Leisure & Entertainment > Games (0.87)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Statistical Learning > Gradient Descent (0.34)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found