Reinforcement Learning Without Backpropagation or a Clock

Kostas, James, Nota, Chris, Thomas, Philip S.

Feb-18-2019–arXiv.org Machine Learning

Reinforcement learning (RL) algorithms share qualitative similarities with the algorithms implemented byanimal brains. However, there remain clear differences between these two types of algorithms. For example, while RL algorithms using artificial neural networks require information to flow backwards through the network via the backpropagation algorithm, there is currently debate about whether this is feasible in biological neural implementations (Werbos and Davis, 2016). Policy gradient coagent networks (PGCNs) are a class of RL algorithms that were introduced to remove this possibly biologically implausible property of RL algorithms--they use artificial neural networks but do not use the backpropagation algorithm (Thomas, 2011). Since their introduction, PGCN algorithms have proven to be not only a possible improvement in biological plausibility, but a practical tool for improving RL agents. They were used to solve RL problems with high-dimensional action spaces (Thomas and Barto, 2012), are the RL precursor to the more general stochastic computation graphs (Schulman et al., 2015), and, as we will show in this paper, generalize the recently proposed option-critic architecture (Bacon et al., 2017), while drastically simplifying key derivations.

coagent, pre, th coagent, (16 more...)

arXiv.org Machine Learning

Feb-18-2019

arXiv.org PDF

Add feedback

Country:
- Asia > Malaysia (0.04)
- North America > United States
  - Massachusetts > Hampshire County > Amherst (0.04)

Genre:
- Research Report (0.81)

Industry:
- Health & Medicine > Therapeutic Area (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks > Backpropagation (0.80)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found