Reviews: Genetic-Gated Networks for Deep Reinforcement Learning

Neural Information Processing Systems 

The authors propose a new RL framework that combines gradient-free genetic algorithms with gradient based optimization (policy gradients). The idea is to parameterize an ensemble of actors by using a binary gating mechanism, similar to dropout, between hidden layers. Instead of sampling a new gate pattern at every iteration, as in dropout, each gate is viewed as a gene and the activation pattern as a chromosome. This allows learning the policy with a combination of a genetic algorithm and policy gradients. The authors apply the proposed algorithm to Atari domain, and the results demonstrate significant improvement over standard algorithms. They also apply their method to continuous control (OpenAI gym MuCoJo benchmarks) yielding results that are comparable to standard PPO.