Finite Sample Analysis of LSTD with Random Projections and Eligibility Traces
Li, Haifang, Xia, Yingce, Zhang, Wensheng
–arXiv.org Artificial Intelligence
Policy evaluation, commonly referred to as value function approximation, is an important and central part in many reinforcement learning (RL) algorithms [27], whose task is to estimate value functions for a fixed policy in a discounted Markov Decision Process (MDP) environment. The value function of each state specifies the accumulated reward an agent would receive in the future by following the fixed policy from that state. Value functions have been widely investigated in RL applications, and it can provide insightful and important information for the agent to obtain an optimal policy, such as important board configurations in Go [24], failure probabilities of large telecommunication networks [9], taxi-out times at large airports [2] and so on. Despite the value functions can be approximated by different ways, the simplest form, linear approximations, are still widely adopted and studied due to their good generalization abilities, relatively efficient computation and solid theoretical guarantees[27, 7, 13, 16]. Temporal Difference (TD) learning is a common approach to this policy evaluation with linear function approximation problem[27]. These typical TD algorithms can be divided into two categories: gradient based methods (e.g., GTD(λ) [28]) and least-square (LS) based methods (e.g., LSTD(λ)[4]). A good survey on these algorithms can be found in [17, 6, 12, 7, 13]. 1 As the development of information technologies, high-dimensional data is widely seen in RL applications [26, 30, 23], which brings serious challenges to design scalable and computationally efficient algorithms for the linear value function approximation problem. To address this practical issue, several approaches have been developed for efficient and effective value function approximation.
arXiv.org Artificial Intelligence
May-25-2018
- Country:
- North America
- Canada > Alberta (0.14)
- United States > Florida (0.04)
- Europe > Germany
- Hesse > Darmstadt Region > Darmstadt (0.04)
- Asia
- Middle East > Jordan (0.04)
- China
- Beijing > Beijing (0.04)
- Anhui Province > Hefei (0.04)
- North America
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Leisure & Entertainment > Games (0.87)