Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines