Augment-Reinforce-Merge Policy Gradient for Binary Stochastic Policy

Open in new window