Rethinking Action Spaces for Reinforcement Learning in End-to-end Dialog Agents with Latent Variable Models