Distributed Policy Evaluation Under Multiple Behavior Strategies

Macua, Sergio Valcarcel, Chen, Jianshu, Zazo, Santiago, Sayed, Ali H.

Nov-5-2014–arXiv.org Artificial Intelligence

We apply diffusion strategies to develop a fully-distributed cooperative reinforcement learning algorithm in which agents in a network communicate only with their immediate neighbors to improve predictions about their environment. The algorithm can also be applied to off-policy learning, meaning that the agents can predict the response to a behavior different from the actual policies they are following. The proposed distributed strategy is efficient, with linear complexity in both computation time and memory footprint. We provide a mean-square-error performance analysis and establish convergence under constant step-size updates, which endow the network with continuous learning capabilities. The results show a clear gain from cooperation: when the individual agents can estimate the solution, cooperation increases stability and reduces bias and variance of the prediction error; but, more importantly, the network is able to approach the optimal solution even when none of the individual agents can (e.g., when the individual behavior policies restrict each agent to sample a small portion of the state space).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

Nov-5-2014

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- Asia (0.67)
- North America
  - Canada (0.67)
  - United States > California
    - Los Angeles County > Los Angeles (0.28)

Genre:
- Research Report (0.70)

Industry:
- Education (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning
    - Optimization (1.00)
    - Agents (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found