On the Generalization Gap in Reparameterizable Reinforcement Learning