GAN Q-learning – Arxiv Vanity

May-25-2018, 15:21:42 GMT–#artificialintelligence

Up to now, deep learning methods in RL used multiple function approximators (typically a network with shared hidden layers) to fit a state value or state-action value distribution. For instance, bootstrappedDQN () used k-heads on the state-action value function Q for every available action and used it to model a distribution. In bayesianpol (), a Bayesian framework was applied to the actor-critic architecture by fitting a Gaussian Process (GP) instead of the critic, hence allowing for a closed-form derivation of update rules. More recently, bellemare2017distributional () introduced a distributional algorithm C51 which aimed to solve the RL problem by learning a categorical probability vector over returns Q. Unlike GANRL () which uses a generative network to learn the underlying transition model of the environment, we utilize a generative network to model the distribution approximation of the Bellman updates.

arxiv vanity, deep learning, reinforcement learning, (4 more...)

#artificialintelligence

May-25-2018, 15:21:42 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (0.76)
  - Neural Networks > Deep Learning (0.61)